我正在尝试解析通过 pyCurl 检索到的 HTML 页面,但 pyCurl WRITEFUNCTION 将页面返回为 BYTES 而不是字符串,因此我无法使用 BeautifulSoup 解析它.
I'm trying to parse a HTML page I retrieved through pyCurl but the pyCurl WRITEFUNCTION is returning the page as BYTES and not string, so I'm unable to Parse it using BeautifulSoup.
有没有办法将 io.BytesIO 转换为 io.StringIO?
Is there any way to convert io.BytesIO to io.StringIO?
或者有没有其他方法可以解析 HTML 页面?
Or Is there any other way to parse the HTML page?
我正在使用 Python 3.3.2.
I'm using Python 3.3.2.
一种天真的方法:
# assume bytes_io is a `BytesIO` object
byte_str = bytes_io.read()
# Convert to a "unicode" object
text_obj = byte_str.decode('UTF-8') # Or use the encoding you expect
# Use text_obj how you see fit!
# io.StringIO(text_obj) will get you to a StringIO object if that's what you need
这篇关于将 io.BytesIO 转换为 io.StringIO 以解析 HTML 页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
html2canvas 捕获除内部画布内容之外的所有内容html2canvas captures everything except the content of an inner canvas(html2canvas 捕获除内部画布内容之外的所有内容)
显示离线 OSM 映射文件.建议:一个带有 Js.library 的Showing an offline OSM map file. Suggestion: an MB Tiles file with Js.library(显示离线 OSM 映射文件.建议:一个带有 Js.library 的 MB Tiles
单击传单标记会将您带到 URLClicking a leaflet marker takes you to URL(单击传单标记会将您带到 URL)
为传单中的标记分配 IDAssign ID to marker in leaflet(为传单中的标记分配 ID)
在图层控件中设置 Leaflet Overlay OffSet Leaflet Overlay Off in the Layer Control(在图层控件中设置 Leaflet Overlay Off)
z-index 未按预期工作z-index not working as intended(z-index 未按预期工作)