如何对 DataFrame 进行排序,以便回收"重复列中的行?
How can I sort a DataFrame so that rows in the duplicate column are "recycled"?
例如,我原来的 DataFrame 是这样的:
For example, my original DataFrame looks like this:
In [3]: df
Out[3]:
A B
0 r1 0
1 r1 1
2 r2 2
3 r2 3
4 r3 4
5 r3 5
我希望它转向:
In [3]: df_sorted
Out[3]:
A B
0 r1 0
2 r2 2
4 r3 4
1 r1 1
3 r2 3
5 r3 5
对行进行排序,使得列 A 中的行处于回收"状态.时尚.
Rows are sorted such that rows in columns A are in a "recycled" fashion.
我在 Pandas 中搜索过 API,但似乎没有任何合适的方法可以这样做.我可以编写一个复杂的函数来完成此操作,但只是想知道是否有任何智能方法或现有的 pandas 方法可以做到这一点?提前非常感谢.
I have searched APIs in Pandas, but it seems there isn't any proper method to do so. I can write a complicated function to accomplish this, but just wondering is there any smart way or existing pandas method can do this? Thanks a lot in advance.
更新:为错误的陈述道歉.在我真正的问题中,列 B 包含字符串值.
Update:
Apologies for a wrong statement. In my real problem, column B contains string values.
你可以使用cumcount 用于计算列 A 中的重复项,然后是 sort_values 首先由 A (在示例没必要,在实际数据中可能很重要),然后通过 C.最后删除列 C 由 <代码>丢弃:
You can use cumcount for counting duplicates in column A, then sort_values first by A (in sample not necessary, in real data maybe important) and then by C. Last remove column C by drop:
df['C'] = df.groupby('A')['A'].cumcount()
df.sort_values(by=['C', 'A'], inplace=True)
print (df)
A B C
0 r1 0 0
2 r2 2 0
4 r3 4 0
1 r1 1 1
3 r2 3 1
5 r3 5 1
df.drop('C', axis=1, inplace=True)
print (df)
A B
0 r1 0
2 r2 2
4 r3 4
1 r1 1
3 r2 3
5 r3 5
时间安排:
小df (len(df)=6)
In [26]: %timeit (jez(df))
1000 loops, best of 3: 2 ms per loop
In [27]: %timeit (boud(df1))
100 loops, best of 3: 2.52 ms per loop
大 df (len(df)=6000)
In [23]: %timeit (jez(df))
100 loops, best of 3: 3.44 ms per loop
In [28]: %timeit (boud(df1))
100 loops, best of 3: 2.52 ms per loop
计时代码:
df = pd.concat([df]*1000).reset_index(drop=True)
df1 = df.copy()
def jez(df):
df['C'] = df.groupby('A')['A'].cumcount()
df.sort_values(by=['C', 'A'], inplace=True)
df.drop('C', axis=1, inplace=True)
return (df)
def boud(df):
df['C'] = df.groupby('A')['B'].rank()
df = df.sort_values(['C', 'A'])
df.drop('C', axis=1, inplace=True)
return (df)
100 loops, best of 3: 4.29 ms per loop
这篇关于按重复对 DataFrame 的行进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何在python中的感兴趣区域周围绘制一个矩形How to draw a rectangle around a region of interest in python(如何在python中的感兴趣区域周围绘制一个矩形)
如何使用 OpenCV 检测和跟踪人员?How can I detect and track people using OpenCV?(如何使用 OpenCV 检测和跟踪人员?)
如何在图像的多个矩形边界框中应用阈值?How to apply threshold within multiple rectangular bounding boxes in an image?(如何在图像的多个矩形边界框中应用阈值?)
如何下载 Coco Dataset 的特定部分?How can I download a specific part of Coco Dataset?(如何下载 Coco Dataset 的特定部分?)
根据文本方向检测图像方向角度Detect image orientation angle based on text direction(根据文本方向检测图像方向角度)
使用 Opencv 检测图像中矩形的中心和角度Detect centre and angle of rectangles in an image using Opencv(使用 Opencv 检测图像中矩形的中心和角度)