如何使用 Python Pandas 绘制堆叠事件持续时间(甘特

时间:2023-03-24
本文介绍了如何使用 Python Pandas 绘制堆叠事件持续时间(甘特图)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Pandas DataFrame,其中包含流量计开始测量流量的日期和车站退役的日期.我想生成一个以图形方式显示这些日期的图.这是我的 DataFrame 的示例:

import pandas as pd数据= {'索引':[40623,40637,40666,40728,40735,40742,4073,40796,40819,40823,40845,40867,4088,40945,40964,40990,41040,41091,41100],"StationId':['UTAHDWQ-5932100','UTAHDWQ-5932230','UTAHDWQ-5932240','UTAHDWQ-5932250','UTAHDWQ-5932253','UTAHDWQ-5932254','UTAHDWQ-5932280','0UTAHDWQ-593228322939','UTAHDWQ-5932750','UTAHDWQ-5983753','UTAHDWQ-5983754','UTAHDWQ-5983755','UTAHDWQ-5983756','UTAHDWQ-5983757','UTAHDWQ-5983759','UTAHDWQ-5983759','UTAHDWQ-5983759','UTAHDWQ-5983756''UTAHDWQ-5983775', 'UTAHDWQ-5989066', 'UTAHDWQ-5996780', 'UTAHDWQ-5996800'], 'amin': ['1994-07-19 13:15:00', '2006-03-16 13:55:00'、'1980-10-31 16:00:00'、'1981-06-11 17:45:00'、'2006-06-28 13:15:00'、'2006-06-28 13:55:00'、'1981-06-11 15:30:00'、'1992-06-10 15:45:00'、'2005-10-03 16:30:00'、'2006-04-25 09:56:00'、'2006-04-25 11:05:00'、'2006-04-25 13:50:00'、'2006-04-25 14:20:00'、'2006-04-25 12:45:00'、'2008-04-08 13:03:00'、'2008-04-08 13:15:00'、'2008-04-15 12:47:00', '2005-10-04 10:15:00', '1995-03-09 13:59:00','1995-03-09 15:13:00'], 'amax': ['1998-06-30 14:51:00', '2007-01-24 12:55:00', '2007-07-31 11:35:00'、'1990-08-01 08:30:00'、'2007-01-24 13:35:00'、'2007-01-24 14:05:00'、'2006-08-22 16:00:00'、'1998-06-30 11:33:00'、'2005-10-22 15:00:00'、'2006-04-25 10:00:00'、'2008-04-08 12:16:00'、'2008-04-08 09:10:00'、'2008-04-08 09:30:00'、'2008-04-08 11:27:00', '2008-04-08 13:05:00', '2008-04-08 13:23:00', '2009-04-07 13:15:00', '2005-10-05 11:40:00', '1996-03-14 10:40:00', '1996-03-14 11:05:00']}df = pd.DataFrame(数据)df.set_index('index', inplace=True)# 显示(df.head())StationId amin amax指数40623 UTAHDWQ-5932100 1994-07-19 13:15:00 1998-06-30 14:51:0040637 UTAHDWQ-5932230 2006-03-16 13:55:00 2007-01-24 12:55:0040666 UTAHDWQ-5932240 1980-10-31 16:00:00 2007-07-31 11:35:0040697 UTAHDWQ-5932250 1981-06-11 17:45:00 1990-08-01 08:30:0040728 UTAHDWQ-5932253 2006-06-28 13:15:00 2007-01-24 13:35:00

我想创建一个与此类似的图(请注意,我没有使用上述数据制作此图):

绘图不必沿每条线显示文本,只需显示带有站点名称的 y 轴.

虽然这看起来像是 pandas 的小众应用,但我知道有几位科学家会从这种绘图能力中受益.

我能找到的最接近的答案在这里:

    • 以下代码也可以使用

    # 使用来自 OP 的 dfdf.amin = pd.to_datetime(df.amin)df.amax = pd.to_datetime(df.amax)无花果,ax = plt.subplots(figsize=(8, 5))ax = plt.hlines(df.index, df.amin, df.amax)

    I have a Pandas DataFrame containing the date that a stream gage started measuring flow and the date that the station was decommissioned. I want to generate a plot showing these dates graphically. Here is a sample of my DataFrame:

    import pandas as pd
    
    data = {'index': [40623, 40637, 40666, 40697, 40728, 40735, 40742, 40773, 40796, 40819, 40823, 40845, 40867, 40887, 40945, 40964, 40990, 41040, 41091, 41100], 'StationId': ['UTAHDWQ-5932100', 'UTAHDWQ-5932230', 'UTAHDWQ-5932240', 'UTAHDWQ-5932250', 'UTAHDWQ-5932253', 'UTAHDWQ-5932254', 'UTAHDWQ-5932280', 'UTAHDWQ-5932290', 'UTAHDWQ-5932750', 'UTAHDWQ-5983753', 'UTAHDWQ-5983754', 'UTAHDWQ-5983755', 'UTAHDWQ-5983756', 'UTAHDWQ-5983757', 'UTAHDWQ-5983759', 'UTAHDWQ-5983760', 'UTAHDWQ-5983775', 'UTAHDWQ-5989066', 'UTAHDWQ-5996780', 'UTAHDWQ-5996800'], 'amin': ['1994-07-19 13:15:00', '2006-03-16 13:55:00', '1980-10-31 16:00:00', '1981-06-11 17:45:00', '2006-06-28 13:15:00', '2006-06-28 13:55:00', '1981-06-11 15:30:00', '1992-06-10 15:45:00', '2005-10-03 16:30:00', '2006-04-25 09:56:00', '2006-04-25 11:05:00', '2006-04-25 13:50:00', '2006-04-25 14:20:00', '2006-04-25 12:45:00', '2008-04-08 13:03:00', '2008-04-08 13:15:00', '2008-04-15 12:47:00', '2005-10-04 10:15:00', '1995-03-09 13:59:00', '1995-03-09 15:13:00'], 'amax': ['1998-06-30 14:51:00', '2007-01-24 12:55:00', '2007-07-31 11:35:00', '1990-08-01 08:30:00', '2007-01-24 13:35:00', '2007-01-24 14:05:00', '2006-08-22 16:00:00', '1998-06-30 11:33:00', '2005-10-22 15:00:00', '2006-04-25 10:00:00', '2008-04-08 12:16:00', '2008-04-08 09:10:00', '2008-04-08 09:30:00', '2008-04-08 11:27:00', '2008-04-08 13:05:00', '2008-04-08 13:23:00', '2009-04-07 13:15:00', '2005-10-05 11:40:00', '1996-03-14 10:40:00', '1996-03-14 11:05:00']}
    df = pd.DataFrame(data)
    df.set_index('index', inplace=True)
    
    # display(df.head())
    
                 StationId                 amin                 amax
    index                                                           
    40623  UTAHDWQ-5932100  1994-07-19 13:15:00  1998-06-30 14:51:00
    40637  UTAHDWQ-5932230  2006-03-16 13:55:00  2007-01-24 12:55:00
    40666  UTAHDWQ-5932240  1980-10-31 16:00:00  2007-07-31 11:35:00
    40697  UTAHDWQ-5932250  1981-06-11 17:45:00  1990-08-01 08:30:00
    40728  UTAHDWQ-5932253  2006-06-28 13:15:00  2007-01-24 13:35:00
    

    I want to create a plot similar to this (please note that I did not make this plot using the above data):

    The plot does not have to have the text shown along each line, just the y-axis with station names.

    While this may seem like a niche application of pandas, I know several scientists that would benefit from this plotting ability.

    The closest answer I could find is here:

    • How to plot stacked proportional graph?
    • How to plot two columns of a pandas data frame using points?
    • Matplotlib timelines
    • Create Gantt Plot with python matplotlib

    The last answer is closest to suiting my needs.

    While I would prefer a way to do it through the Pandas wrapper, I would be open and grateful to a straight matplotlib solution.

    解决方案

    • I think you are trying to create a gantt plot.
    • This suggests using hlines
    • Tested in matplotlib 3.4.2

    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib.dates as dt
    
    # using df from the OP
    
    # convert columns to a datetime dtype
    df.amin = pd.to_datetime(df.amin)
    df.amax = pd.to_datetime(df.amax)
    
    fig, ax = plt.subplots(figsize=(8, 5))
    ax = ax.xaxis_date()
    ax = plt.hlines(df.index, dt.date2num(df.amin), dt.date2num(df.amax))
    

    • The following code also works

    # using df from the OP
    
    df.amin = pd.to_datetime(df.amin)
    df.amax = pd.to_datetime(df.amax)
    
    fig, ax = plt.subplots(figsize=(8, 5))
    ax = plt.hlines(df.index, df.amin, df.amax)
    

    这篇关于如何使用 Python Pandas 绘制堆叠事件持续时间(甘特图)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

上一篇:等价于 python 的 auto.arima() 下一篇:如何在 Pyspark 中随时间序列数据使用滑动窗口转

相关文章

最新文章