我需要打开一个 .bi5 文件并阅读其内容以长话短说.问题:我有数以万计的 .bi5 文件,其中包含我需要解压缩和处理(读取、转储到 pandas)的时间序列数据.
I need to open a .bi5 file and read the contents to cut a long story short. The problem: I have tens of thousands of .bi5 files containing time-series data that I need to decompress and process (read, dump into pandas).
我最终为 lzma 库安装了 Python 3(我通常使用 2.7),因为我遇到了使用 Python 2.7 的 lzma 后向端口编译的噩梦,所以我承认并使用 Python 3 运行,但没有成功.问题多得不胜枚举,长问题没人看!
I ended up installing Python 3 (I use 2.7 normally) specifically for the lzma library, as I ran into compiling nightmares using the lzma back-ports for Python 2.7, so I conceded and ran with Python 3, but with no success. The problems are too numerous to divulge, no one reads long questions!
我已经包含了其中一个 .bi5 文件,如果有人能够设法将其放入 Pandas Dataframe 并向我展示他们是如何做到的,那将是理想的.
I have included one of the .bi5 files, if someone could manage to get it into a Pandas Dataframe and show me how they did it, that would be ideal.
ps这个文件只有几kb,它会在一秒钟内下载.首先十分感谢.
ps the fie is only a few kb, it will download in a second. Thanks very much in advance.
(文件)http://www.filedropper.com/13hticks
下面的代码应该可以解决问题.首先,它打开一个文件并在 lzma 中对其进行解码,然后使用 struct 解压二进制数据.
The code below should do the trick. First, it opens a file and decodes it in lzma and then uses struct to unpack the binary data.
import lzma
import struct
import pandas as pd
def bi5_to_df(filename, fmt):
chunk_size = struct.calcsize(fmt)
data = []
with lzma.open(filename) as f:
while True:
chunk = f.read(chunk_size)
if chunk:
data.append(struct.unpack(fmt, chunk))
else:
break
df = pd.DataFrame(data)
return df
最重要的是要知道正确的格式.我四处搜索并尝试猜测,'>3i2f'(或 >3I2f)效果很好.(这是大端 3 个整数 2 个浮点数.您的建议:'i4f' 不会产生合理的浮点数 - 无论是大端还是小端.)对于 struct 和格式语法请参阅 文档.
The most important thing is to know the right format. I googled around and tried to guess and '>3i2f' (or >3I2f) works quite good. (It's big endian 3 ints 2 floats. What you suggest: 'i4f' doesn't produce sensible floats - regardless whether big or little endian.) For struct and format syntax see the docs.
df = bi5_to_df('13h_ticks.bi5', '>3i2f')
df.head()
Out[177]:
0 1 2 3 4
0 210 110218 110216 1.87 1.12
1 362 110219 110216 1.00 5.85
2 875 110220 110217 1.00 1.12
3 1408 110220 110218 1.50 1.00
4 1884 110221 110219 3.94 1.00
<小时>
更新
将 bi5_to_df 的输出与 https://github.com/ninety47/进行比较杜高斯贝,我从那里编译并运行 test_read_bi5 .输出的第一行是:
To compare the output of bi5_to_df with https://github.com/ninety47/dukascopy,
I compiled and run test_read_bi5 from there. The first lines of the output are:
time, bid, bid_vol, ask, ask_vol
2012-Dec-03 01:00:03.581000, 131.945, 1.5, 131.966, 1.5
2012-Dec-03 01:00:05.142000, 131.943, 1.5, 131.964, 1.5
2012-Dec-03 01:00:05.202000, 131.943, 1.5, 131.964, 2.25
2012-Dec-03 01:00:05.321000, 131.944, 1.5, 131.964, 1.5
2012-Dec-03 01:00:05.441000, 131.944, 1.5, 131.964, 1.5
和 bi5_to_df 在同一个输入文件上给出:
And bi5_to_df on the same input file gives:
bi5_to_df('01h_ticks.bi5', '>3I2f').head()
Out[295]:
0 1 2 3 4
0 3581 131966 131945 1.50 1.5
1 5142 131964 131943 1.50 1.5
2 5202 131964 131943 2.25 1.5
3 5321 131964 131944 1.50 1.5
4 5441 131964 131944 1.50 1.5
所以一切似乎都很好(ninety47 的代码重新排列了列).
So everything seems to be fine (ninety47's code reorders columns).
另外,使用 '>3I2f' 而不是 '>3i2f' 可能更准确(即 unsigned int 而不是int).
Also, it's probably more accurate to use '>3I2f' instead of '>3i2f' (i.e. unsigned int instead of int).
这篇关于解压并读取 Dukascopy .bi5 刻度文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何在python中的感兴趣区域周围绘制一个矩形How to draw a rectangle around a region of interest in python(如何在python中的感兴趣区域周围绘制一个矩形)
如何使用 OpenCV 检测和跟踪人员?How can I detect and track people using OpenCV?(如何使用 OpenCV 检测和跟踪人员?)
如何在图像的多个矩形边界框中应用阈值?How to apply threshold within multiple rectangular bounding boxes in an image?(如何在图像的多个矩形边界框中应用阈值?)
如何下载 Coco Dataset 的特定部分?How can I download a specific part of Coco Dataset?(如何下载 Coco Dataset 的特定部分?)
根据文本方向检测图像方向角度Detect image orientation angle based on text direction(根据文本方向检测图像方向角度)
使用 Opencv 检测图像中矩形的中心和角度Detect centre and angle of rectangles in an image using Opencv(使用 Opencv 检测图像中矩形的中心和角度)