我已经阅读了 MySQL ORDER BY RAND() 函数的一些替代方案,但大多数替代方案仅适用于需要单个随机结果的地方.
I've read about a few alternatives to MySQL's ORDER BY RAND() function, but most of the alternatives apply only to where on a single random result is needed.
有谁知道如何优化返回多个随机结果的查询,例如:
Does anyone have any idea how to optimize a query that returns multiple random results, such as this:
SELECT u.id,
p.photo
FROM users u, profiles p
WHERE p.memberid = u.id
AND p.photo != ''
AND (u.ownership=1 OR u.stamp=1)
ORDER BY RAND()
LIMIT 18
此解决方案在使用索引列时效果最佳.
这里是一个简单的例子,优化的查询平台标有 100,000 行.
Here is a simple example of and optimized query bench marked with 100,000 rows.
优化:300 毫秒
SELECT
g.*
FROM
table g
JOIN
(SELECT
id
FROM
table
WHERE
RAND() < (SELECT
((4 / COUNT(*)) * 10)
FROM
table)
ORDER BY RAND()
LIMIT 4) AS z ON z.id= g.id
注意限制数量:限制 4 和 4/count(*).4s 必须是相同的数字.更改返回的数量不会对速度产生太大影响.限制 4 和限制 1000 的基准是相同的.限制 10,000 花了 600 毫秒
note about limit ammount: limit 4 and 4/count(*). The 4s need to be the same number. Changing how many you return doesn't effect the speed that much. Benchmark at limit 4 and limit 1000 are the same. Limit 10,000 took it up to 600ms
关于加入的注意事项:仅随机化 id 比随机化整行更快.由于它必须将整行复制到内存中,然后对其进行随机化.联接可以是链接到子查询的任何表,以防止表扫描.
note about join: Randomizing just the id is faster than randomizing a whole row. Since it has to copy the entire row into memory then randomize it. The join can be any table that is linked to the subquery Its to prevent tablescans.
注意 where 子句:where 计数限制了随机结果的数量.它会根据结果的百分比对它们进行排序,而不是对整个表格进行排序.
note where clause: The where count limits down the ammount of results that are being randomized. It takes a percentage of the results and sorts them rather than the whole table.
注意子查询:如果执行连接和额外的 where 子句条件,您需要将它们同时放在子查询和子子查询中.进行准确的计数并拉回正确的数据.
note sub query: The if doing joins and extra where clause conditions you need to put them both in the subquery and the subsubquery. To have an accurate count and pull back correct data.
未优化:1200 毫秒
SELECT
g.*
FROM
table g
ORDER BY RAND()
LIMIT 4
优点
比 order by rand() 快 4 倍.此解决方案适用于任何带有索引列的表.
4x faster than order by rand(). This solution can work with any table with a indexed column.
缺点
复杂查询有点复杂.需要在子查询中维护2个代码库
It is a bit complex with complex queries. Need to maintain 2 code bases in the subqueries
这篇关于MySQL:ORDER BY RAND() 的替代方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何有效地使用窗口函数根据 N 个先前值来决定How to use windowing functions efficiently to decide next N number of rows based on N number of previous values(如何有效地使用窗口函数根据
在“GROUP BY"中重用选择表达式的结果;条款reuse the result of a select expression in the quot;GROUP BYquot; clause?(在“GROUP BY中重用选择表达式的结果;条款?)
Pyspark DataFrameWriter jdbc 函数的 ignore 选项是忽略整Does ignore option of Pyspark DataFrameWriter jdbc function ignore entire transaction or just offending rows?(Pyspark DataFrameWriter jdbc 函数的 ig
使用 INSERT INTO table ON DUPLICATE KEY 时出错,使用 Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array(使用 INSERT INTO table ON DUPLICATE KEY 时出错,使用 for 循环数组
pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver(pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合适的
如何将 Apache Spark 与 MySQL 集成以将数据库表作为How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe?(如何将 Apache Spark 与 MySQL 集成以将数据库表作为