This is probably old knowledge, but for some reason I never ran into it until this week. You should almost never use the RAND() function of MySQL in an ORDER BY.
The ORDER BY RAND() operation actually re-queries each row of your table, assigns a random number ID and then delivers the results. This takes a large amount of processing time for table of more than 500 rows.
We had a script with a query like this:
SELECT id,title,publishDate FROM mytable ORDER BY RAND() LIMIT 5
This took 1400ms to run because the table had 5000+ records.
We changed it to query for a full list of IDs, then randomly choose 5 IDs using ColdFusion and randrange(), then requeried for just those 5 records. Even with two queries instead of one, it is MUCH faster - at 15ms total for the database transactions.
So, as a general rule, AVOID USING ORDER BY RAND(). If you have a really small table it is fine, but it does not scale well at all.






Jul 16, 2009 at 7:11 AM Sounds to me that you didn't have the proper indexing on the table? Can you confirm this? You need an index plan for title, publishDate. Should already be one on id.
Jul 16, 2009 at 7:51 AM Title was not indexed, but publishDate is. With RAND and an index on title, it runs at around 800ms. Better, but not even close. This isn't something unique to me. If you search Google, you will see that a lot of people experience similar issues and recommend not using RAND. For example:
http://forums.mysql.com/read.php?24,163940,163940#msg-163940
http://www.gorilla3d.com/v8/fast-random-rows-in-mysql.html
Jul 16, 2009 at 8:00 AM Again, just a suggestion. Numeric will always be the fastest to index/randomize.
How about this?
SELECT id,title,publishDate FROM mytable WHERE id IN (select t.id FROM mytable t ORDER BY RAND() LIMIT 5)
Jul 16, 2009 at 8:18 AM Todd - The ID column is not numeric, which may be amplifying the problem. It is a UUID. The example you provide still runs between 600 and 800ms.
Thanks for offering up these solutions. I am sure they will be helpful to others who read through this. I think in my case, though, I am best served by allowing ColdFusion to do the randomizing. It runs MUCH faster and the impact is so small that it is probably not worth adjusting.
Jul 16, 2009 at 8:25 AM Yup, that was silly me for assuming id was numeric since columns generally called 'id' are.
When I create a table these days, I still include a numeric auto-num primary key and my uuid's are usually in a column named 'create_uuid' (which follows my create_dt, create_user, update_dt, update_user scheme). Primary numeric keys are still useful.
Jul 24, 2009 at 10:25 AM This is good to know. I didn't realize that it had to requery the table to apply the rand() - I just assumed that happend in real time as the records were being collected.
Jul 27, 2009 at 8:57 PM The performance problem doesn't stem solely from RAND() but from the combination 'ORDER BY RAND()'. There's basically nothing wrong with RAND() as it only generates a random number but sorting the WHOLE table by an array of such random numbers is what demands so much processing time.
Thus, your advice shoouldn't be "AVOID USING RAND()" but "AVOID USING ORDER BY RAND()"
Jul 28, 2009 at 8:38 AM @Bochi - Excellent point. I updated the article to reflect this.
Feb 2, 2010 at 6:32 AM Now this is hghly recommeded post for me. I will surely email this to my friend.
Regards
Richard
Feb 4, 2010 at 11:30 AM Good to know but the fix was not very clear. For one I don't use Cold Fusion and two, posting some exact code would be better (such as PHP a more common language).
Feb 4, 2010 at 2:15 PM @Rob - Sorry, but PHP is for amateurs. :-) Sorry, couldn't resist. You'll have to look elsewhere for your PHP code.
Here's a non-language-specific description of the solution...
1. Query the database for all distinct IDs and store them in a list, array, or struct.
SELECT id FROM mytable
2. Generate 5 (or whatever number) unique numbers between 1 and the length of your list, array, or struct.
3. Get the items at those 5 positions in your list/array/struct.
4. Run a query with an IN statement based on those 5 items.
SELECT * FROM mytable WHERE id IN (3,46,13,12,10)
It seems like this would be slower, but actually it is MUCH faster.
Feb 17, 2010 at 5:15 PM @Michael Sprague sorry, but if you say PHP is for amateurs then you are a really ignorant and stupid person :-). How's Coldfusion if php is for amateurs?
Don't throw words if you don't have some useful to say.
"Sorry, couldn't resist"
Feb 17, 2010 at 7:00 PM @alex - The :-) meant it was a joke. Apparently you are the only one that didn't get that. If you want to know my real opinion on languages, including PHP, read my most recent post.
http://www.webtrenches.com/post.cfm/coldfusion-php-rails-asp-net-and-more-what-do-i-choose
Apr 1, 2010 at 1:59 PM podria dejar un ejemplo de como hacerlo?
gracias
Oct 16, 2010 at 4:26 AM I don't know why the MySQL manual suggests ORDER BY RAND() as a way to fetch random rows if it is soooo slow... You would think MySQL would automatically optimize this query..
Anyway, here's the solution I came up with for PHP (after combining a couple of suggestions from some posts I read).
<?php
//get number of rows for the query
list($max) = mysql_fetch_array(mysql_query("SELECT COUNT(*) FROM some_tbl WHERE cond=1 AND cond2=1"));
//then use php's rand function as the offset
mysql_query("SELECT * FROM some_tbl WHERE cond=1 AND cond2=1 LIMIT " . rand(0, $max - 1) . ", 1");
?>
Of course, this will fetch just one row, but hopefully you will be able to adjust it to your needs.
If you need more than 1 row, you could use a union or generate an array of rand values before performing the second query.
I hope this helps someone!
Sep 30, 2011 at 11:11 AM I was curious about the second query you obtain.
Is it something like (for your 5 rows example) SELECT * FROM mytable WHERE id=1 OR id=4 OR id=3 OR id=6 OR id=9 ? I mean, a long list of " OR id=x " ?
Sep 30, 2011 at 1:56 PM Kev, you could do it that way, but it would be more efficient to use an IN statement like this...
SELECT * FROM mytable WHERE id IN (3,46,13,12,10)
Dec 28, 2011 at 6:22 AM Rob:
$res = mysql_query("SELECT id FROM sweettable");
$rows = mysql_fetch_assoc($res);
$rand_row = $rows[ array_rand($rows) ];
$res = mysql_query("SELECT * FROM sweetttable WHERE id=$rand_row[id]");
$random_guy = mysql_fetch_assoc($res);
Feb 2, 2012 at 1:41 PM Thanks alot!
Im really in trouble with order by rand(), terrible performance