我有一个 pandas 数据框,其中包含根据两列(A 和 B)的重复值:
I have a pandas dataframe which contains duplicates values according to two columns (A and B):
A B C
1 2 1
1 2 4
2 7 1
3 4 0
3 4 8
我想删除在 C 列中保持最大值的行的重复项.这将导致:
I want to remove duplicates keeping the row with max value in column C. This would lead to:
A B C
1 2 4
2 7 1
3 4 8
我不知道该怎么做.我应该使用 drop_duplicates() 吗?
I cannot figure out how to do that. Should I use drop_duplicates(), something else?
你可以使用 group by:
You can do it using group by:
c_maxes = df.groupby(['A', 'B']).C.transform(max)
df = df.loc[df.C == c_maxes]
c_maxes 是每个组中 C 最大值的Series,但长度和索引相同df.如果您还没有使用过 .transform,那么打印 c_maxes 可能是一个好主意,看看它是如何工作的.
c_maxes is a Series of the maximum values of C in each group but which is of the same length and with the same index as df. If you haven't used .transform then printing c_maxes might be a good idea to see how it works.
使用 drop_duplicates 的另一种方法是
Another approach using drop_duplicates would be
df.sort('C').drop_duplicates(subset=['A', 'B'], take_last=True)
不确定哪个更有效,但我猜是第一种方法,因为它不涉及排序.
Not sure which is more efficient but I guess the first approach as it doesn't involve sorting.
从 pandas 0.18 开始,第二个解决方案是
From pandas 0.18 up the second solution would be
df.sort_values('C').drop_duplicates(subset=['A', 'B'], keep='last')
或者,或者,
df.sort_values('C', ascending=False).drop_duplicates(subset=['A', 'B'])
无论如何,groupby 解决方案的性能似乎要好得多:
In any case, the groupby solution seems to be significantly more performing:
%timeit -n 10 df.loc[df.groupby(['A', 'B']).C.max == df.C]
10 loops, best of 3: 25.7 ms per loop
%timeit -n 10 df.sort_values('C').drop_duplicates(subset=['A', 'B'], keep='last')
10 loops, best of 3: 101 ms per loop
这篇关于从数据框中删除重复项,基于两列 A,B,在另一列 C 中保持具有最大值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何在python中的感兴趣区域周围绘制一个矩形How to draw a rectangle around a region of interest in python(如何在python中的感兴趣区域周围绘制一个矩形)
如何使用 OpenCV 检测和跟踪人员?How can I detect and track people using OpenCV?(如何使用 OpenCV 检测和跟踪人员?)
如何在图像的多个矩形边界框中应用阈值?How to apply threshold within multiple rectangular bounding boxes in an image?(如何在图像的多个矩形边界框中应用阈值?)
如何下载 Coco Dataset 的特定部分?How can I download a specific part of Coco Dataset?(如何下载 Coco Dataset 的特定部分?)
根据文本方向检测图像方向角度Detect image orientation angle based on text direction(根据文本方向检测图像方向角度)
使用 Opencv 检测图像中矩形的中心和角度Detect centre and angle of rectangles in an image using Opencv(使用 Opencv 检测图像中矩形的中心和角度)