如何从 MySQL 表中删除所有重复数据?
How would I delete all duplicate data from a MySQL Table?
例如,使用以下数据:
SELECT * FROM names;
+----+--------+
| id | name |
+----+--------+
| 1 | google |
| 2 | yahoo |
| 3 | msn |
| 4 | google |
| 5 | google |
| 6 | yahoo |
+----+--------+
如果是 SELECT 查询,我会使用 SELECT DISTINCT name FROM names;.
I would use SELECT DISTINCT name FROM names; if it were a SELECT query.
我将如何使用 DELETE 执行此操作以仅删除重复项并仅保留每个记录?
How would I do this with DELETE to only remove duplicates and keep just one record of each?
编辑器警告:此解决方案计算效率低下,可能会导致大表的连接中断.
注意 - 您需要首先在您的表的测试副本上执行此操作!
NB - You need to do this first on a test copy of your table!
当我这样做时,我发现除非我还包含了 AND n1.id <>n2.id,它删除了表中的每一行.
When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.
如果要保留 id 值最低的行:
DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
如果要保留 id 值最高的行:
DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
我在 MySQL 5.1 中使用过这种方法
I used this method in MySQL 5.1
不确定其他版本.
更新:由于人们在谷歌上搜索删除重复项最终会出现在这里
尽管 OP 的问题是关于 DELETE,但请注意使用 INSERT 和 DISTINCT 会快得多.对于一个有 800 万行的数据库,下面的查询用了 13 分钟,而使用 DELETE 时,用了 2 个多小时还没有完成.
Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE, please be advised that using INSERT and DISTINCT is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
SELECT DISTINCT cellId,attributeId,entityRowId,value
FROM tableName;
这篇关于删除除 MySQL 中的一行之外的所有重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何有效地使用窗口函数根据 N 个先前值来决定How to use windowing functions efficiently to decide next N number of rows based on N number of previous values(如何有效地使用窗口函数根据
在“GROUP BY"中重用选择表达式的结果;条款reuse the result of a select expression in the quot;GROUP BYquot; clause?(在“GROUP BY中重用选择表达式的结果;条款?)
Pyspark DataFrameWriter jdbc 函数的 ignore 选项是忽略整Does ignore option of Pyspark DataFrameWriter jdbc function ignore entire transaction or just offending rows?(Pyspark DataFrameWriter jdbc 函数的 ig
使用 INSERT INTO table ON DUPLICATE KEY 时出错,使用 Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array(使用 INSERT INTO table ON DUPLICATE KEY 时出错,使用 for 循环数组
pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver(pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合适的
如何将 Apache Spark 与 MySQL 集成以将数据库表作为How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe?(如何将 Apache Spark 与 MySQL 集成以将数据库表作为