如何在 Python 中判断文件是否为二进制(非文本)文件?
How can I tell if a file is binary (non-text) in Python?
我在 Python 中搜索大量文件,并不断在二进制文件中找到匹配项.这使得输出看起来非常混乱.
I am searching through a large set of files in Python, and keep getting matches in binary files. This makes the output look incredibly messy.
我知道我可以使用 grep -I,但我对数据所做的工作超出了 grep 所允许的范围.
I know I could use grep -I, but I am doing more with the data than what grep allows for.
在过去,我只会搜索大于 0x7f 的字符,但 utf8 等在现代系统上是不可能的.理想情况下,解决方案会很快.
In the past, I would have just searched for characters greater than 0x7f, but utf8 and the like, make that impossible on modern systems. Ideally, the solution would be fast.
你也可以使用 mimetypes 模块:
You can also use the mimetypes module:
import mimetypes
...
mime = mimetypes.guess_type(file)
编译二进制mime 类型列表相当容易.例如,Apache 分发了一个 mime.types 文件,您可以将其解析为一组列表、二进制和文本,然后检查 mime 是否在您的文本或二进制列表中.
It's fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list.
这篇关于如何在 Python 中检测文件是否为二进制(非文本)文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何在python中的感兴趣区域周围绘制一个矩形How to draw a rectangle around a region of interest in python(如何在python中的感兴趣区域周围绘制一个矩形)
如何使用 OpenCV 检测和跟踪人员?How can I detect and track people using OpenCV?(如何使用 OpenCV 检测和跟踪人员?)
如何在图像的多个矩形边界框中应用阈值?How to apply threshold within multiple rectangular bounding boxes in an image?(如何在图像的多个矩形边界框中应用阈值?)
如何下载 Coco Dataset 的特定部分?How can I download a specific part of Coco Dataset?(如何下载 Coco Dataset 的特定部分?)
根据文本方向检测图像方向角度Detect image orientation angle based on text direction(根据文本方向检测图像方向角度)
使用 Opencv 检测图像中矩形的中心和角度Detect centre and angle of rectangles in an image using Opencv(使用 Opencv 检测图像中矩形的中心和角度)