我是 pandas 的新手,现在我不知道如何安排我的时间系列,看看吧:
I am new on pandas and for now i don't get how to arrange my time serie, take a look at it :
date & time of connection
19/06/2017 12:39
19/06/2017 12:40
19/06/2017 13:11
20/06/2017 12:02
20/06/2017 12:04
21/06/2017 09:32
21/06/2017 18:23
21/06/2017 18:51
21/06/2017 19:08
21/06/2017 19:50
22/06/2017 13:22
22/06/2017 13:41
22/06/2017 18:01
23/06/2017 16:18
23/06/2017 17:00
23/06/2017 19:25
23/06/2017 20:58
23/06/2017 21:03
23/06/2017 21:05
这是 130 k 原始数据集的样本,我试过:df.groupby('连接的日期和时间')['日期&连接时间'].apply(list)
This is a sample of a dataset of 130 k raws,I tried :
df.groupby('date & time of connection')['date & time of connection'].apply(list)
我猜还不够
我想我应该:
你觉得我的逻辑怎么样?你知道一些tutos吗?非常感谢
What do you think about my logic ? Do you know some tutos ? Thank you very much
你可以使用dt.floor 用于转换为 dates,然后转换为 value_counts 或 groupby 与 大小:
You can use dt.floor for convert to dates and then value_counts or groupby with size:
df = (pd.to_datetime(df['date & time of connection'])
.dt.floor('d')
.value_counts()
.rename_axis('date')
.reset_index(name='count'))
print (df)
date count
0 2017-06-23 6
1 2017-06-21 5
2 2017-06-19 3
3 2017-06-22 3
4 2017-06-20 2
或者:
s = pd.to_datetime(df['date & time of connection'])
df = s.groupby(s.dt.floor('d')).size().reset_index(name='count')
print (df)
date & time of connection count
0 2017-06-19 3
1 2017-06-20 2
2 2017-06-21 5
3 2017-06-22 3
4 2017-06-23 6
时间安排:
np.random.seed(1542)
N = 220000
a = np.unique(np.random.randint(N, size=int(N/2)))
df = pd.DataFrame(pd.date_range('2000-01-01', freq='37T', periods=N)).drop(a)
df.columns = ['date & time of connection']
df['date & time of connection'] = df['date & time of connection'].dt.strftime('%d/%m/%Y %H:%M:%S')
print (df.head())
In [193]: %%timeit
...: df['date & time of connection']=pd.to_datetime(df['date & time of connection'])
...: df1 = df.groupby(by=df['date & time of connection'].dt.date).count()
...:
539 ms ± 45.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [194]: %%timeit
...: df1 = (pd.to_datetime(df['date & time of connection'])
...: .dt.floor('d')
...: .value_counts()
...: .rename_axis('date')
...: .reset_index(name='count'))
...:
12.4 ms ± 350 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [195]: %%timeit
...: s = pd.to_datetime(df['date & time of connection'])
...: df2 = s.groupby(s.dt.floor('d')).size().reset_index(name='count')
...:
17.7 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这篇关于蟒蛇Pandas - 按天分组并计算每一天的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何在 Python 中将货币字符串转换为浮点数?How do I convert a currency string to a floating point number in Python?(如何在 Python 中将货币字符串转换为浮点数?)
在 Pandas 中解析多索引 Excel 文件Parsing a Multi-Index Excel File in Pandas(在 Pandas 中解析多索引 Excel 文件)
pandas 时间序列 between_datetime 函数?pandas timeseries between_datetime function?( pandas 时间序列 between_datetime 函数?)
pandas 重新采样到每月的特定工作日pandas resample to specific weekday in month( pandas 重新采样到每月的特定工作日)
在 Pandas 中合并/组合两个具有不同频率时间序列Merging/combining two dataframes with different frequency time series indexes in Pandas?(在 Pandas 中合并/组合两个具有不同频率时间序列索
Python - 如何标准化时间序列数据Python - how to normalize time-series data(Python - 如何标准化时间序列数据)