我有这个文件表(这里是简化版):
I have this table for documents (simplified version here):
| id | 转 | 内容 |
|---|---|---|
| 1 | 1 | ... |
| 2 | 1 | ... |
| 1 | 2 | ... |
| 1 | 3 | ... |
如何为每个 ID 选择一行并且只选择最大的转速?
有了上面的数据,结果应该包含两行:[1, 3, ...] 和 [2, 1, ..].我正在使用 MySQL.
How do I select one row per id and only the greatest rev?
With the above data, the result should contain two rows: [1, 3, ...] and [2, 1, ..]. I'm using MySQL.
目前我在 while 循环中使用检查来检测和覆盖结果集中的旧版本.但这是达到结果的唯一方法吗?没有SQL解决方案吗?
Currently I use checks in the while loop to detect and over-write old revs from the resultset. But is this the only method to achieve the result? Isn't there a SQL solution?
您只需要一个带有 MAX 聚合函数的 GROUP BY 子句:
All you need is a GROUP BY clause with the MAX aggregate function:
SELECT id, MAX(rev)
FROM YourTable
GROUP BY id
我刚刚注意到您还需要 content 列.
这是 SQL 中一个非常常见的问题:在每个组标识符的列中找到具有最大值的行的整个数据.在我的职业生涯中,我听到了很多.实际上,这是我在当前工作的技术面试中回答的问题之一.
This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.
实际上,Stack Overflow 社区创建了一个标签来处理这样的问题非常普遍:greatest-n-per-group.
It is, actually, so common that Stack Overflow community has created a single tag just to deal with questions like that: greatest-n-per-group.
基本上,您有两种方法可以解决该问题:
Basically, you have two approaches to solve that problem:
在这种方法中,您首先在子查询中找到 group-identifier, max-value-in-group(上面已经解决了).然后你将你的表加入到子查询中,在 group-identifier 和 max-value-in-group 上相等:
In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:
SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
SELECT id, MAX(rev) rev
FROM YourTable
GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev
在这种方法中,您将表与自身分开.group-identifier 中的平等.然后,2个聪明的动作:
In this approach, you left join the table with itself. Equality goes in the group-identifier. Then, 2 smart moves:
NULL(这是一个 LEFT JOIN,还记得吗?).然后,我们过滤连接的结果,只显示右侧为 NULL 的行.NULL in the right side (it's a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.所以你最终得到:
SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;
两种方法都带来了完全相同的结果.
Both approaches bring the exact same result.
如果您有两行 max-value-in-group 用于 group-identifier,那么这两种方法的结果中都会包含这两行.
If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.
这两种方法都与 SQL ANSI 兼容,因此,无论其风格"如何,都可以与您最喜欢的 RDBMS 一起使用.
Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".
这两种方法也是性能友好的,但是您的里程可能会有所不同(RDBMS、DB 结构、索引等).因此,当您选择一种方法而不是另一种方法时,基准.并确保您选择对您最有意义的那个.
Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.
这篇关于SQL 仅选择列上具有最大值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何有效地使用窗口函数根据 N 个先前值来决定How to use windowing functions efficiently to decide next N number of rows based on N number of previous values(如何有效地使用窗口函数根据
在“GROUP BY"中重用选择表达式的结果;条款reuse the result of a select expression in the quot;GROUP BYquot; clause?(在“GROUP BY中重用选择表达式的结果;条款?)
Pyspark DataFrameWriter jdbc 函数的 ignore 选项是忽略整Does ignore option of Pyspark DataFrameWriter jdbc function ignore entire transaction or just offending rows?(Pyspark DataFrameWriter jdbc 函数的 ig
使用 INSERT INTO table ON DUPLICATE KEY 时出错,使用 Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array(使用 INSERT INTO table ON DUPLICATE KEY 时出错,使用 for 循环数组
pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver(pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合适的
如何将 Apache Spark 与 MySQL 集成以将数据库表作为How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe?(如何将 Apache Spark 与 MySQL 集成以将数据库表作为