Avoid ORDER BY RAND() in MySQL

ColdFusion , MySQL , Web Development Add comments

This is probably old knowledge, but for some reason I never ran into it until this week.  You should almost never use the RAND() function of MySQL in an ORDER BY.

The ORDER BY RAND() operation actually re-queries each row of your table, assigns a random number ID and then delivers the results.  This takes a large amount of processing time for table of more than 500 rows. 

We had a script with a query like this:
SELECT id,title,publishDate FROM mytable ORDER BY RAND() LIMIT 5

This took 1400ms to run because the table had 5000+ records.

We changed it to query for a full list of IDs, then randomly choose 5 IDs using ColdFusion and randrange(), then requeried for just those 5 records.  Even with two queries instead of one, it is MUCH faster - at 15ms total for the database transactions.

So, as a general rule, AVOID USING ORDER BY RAND().   If you have a really small table it is fine, but it does not scale well at all.

 

29 responses to “Avoid ORDER BY RAND() in MySQL”

  1. Todd Rafferty Says:
    Sounds to me that you didn't have the proper indexing on the table? Can you confirm this? You need an index plan for title, publishDate. Should already be one on id.
  2. Michael Sprague Says:
    Title was not indexed, but publishDate is. With RAND and an index on title, it runs at around 800ms. Better, but not even close. This isn't something unique to me. If you search Google, you will see that a lot of people experience similar issues and recommend not using RAND. For example:

    http://forums.mysql.com/read.php?24,163940,163940#msg-163940
    http://www.gorilla3d.com/v8/fast-random-rows-in-mysql.html
  3. Todd Rafferty Says:
    Again, just a suggestion. Numeric will always be the fastest to index/randomize.

    How about this?
    SELECT id,title,publishDate FROM mytable WHERE id IN (select t.id FROM mytable t ORDER BY RAND() LIMIT 5)
  4. Michael Sprague Says:
    Todd - The ID column is not numeric, which may be amplifying the problem. It is a UUID. The example you provide still runs between 600 and 800ms.

    Thanks for offering up these solutions. I am sure they will be helpful to others who read through this. I think in my case, though, I am best served by allowing ColdFusion to do the randomizing. It runs MUCH faster and the impact is so small that it is probably not worth adjusting.
  5. Todd Rafferty Says:
    Yup, that was silly me for assuming id was numeric since columns generally called 'id' are.

    When I create a table these days, I still include a numeric auto-num primary key and my uuid's are usually in a column named 'create_uuid' (which follows my create_dt, create_user, update_dt, update_user scheme). Primary numeric keys are still useful.
  6. Ben Nadel Says:
    This is good to know. I didn't realize that it had to requery the table to apply the rand() - I just assumed that happend in real time as the records were being collected.
  7. Bochi Says:
    The performance problem doesn't stem solely from RAND() but from the combination 'ORDER BY RAND()'. There's basically nothing wrong with RAND() as it only generates a random number but sorting the WHOLE table by an array of such random numbers is what demands so much processing time.
    Thus, your advice shoouldn't be "AVOID USING RAND()" but "AVOID USING ORDER BY RAND()"
  8. Michael Sprague Says:
    @Bochi - Excellent point. I updated the article to reflect this.
  9. Lab Equipment / Laboratory Equipment Says:
    Now this is hghly recommeded post for me. I will surely email this to my friend.


    Regards

    Richard
  10. Rob Says:
    Good to know but the fix was not very clear. For one I don't use Cold Fusion and two, posting some exact code would be better (such as PHP a more common language).
  11. Michael Sprague Says:
    @Rob - Sorry, but PHP is for amateurs. :-) Sorry, couldn't resist. You'll have to look elsewhere for your PHP code.

    Here's a non-language-specific description of the solution...

    1. Query the database for all distinct IDs and store them in a list, array, or struct.
    SELECT id FROM mytable

    2. Generate 5 (or whatever number) unique numbers between 1 and the length of your list, array, or struct.

    3. Get the items at those 5 positions in your list/array/struct.

    4. Run a query with an IN statement based on those 5 items.
    SELECT * FROM mytable WHERE id IN (3,46,13,12,10)

    It seems like this would be slower, but actually it is MUCH faster.
  12. alex Says:
    @Michael Sprague sorry, but if you say PHP is for amateurs then you are a really ignorant and stupid person :-). How's Coldfusion if php is for amateurs?
    Don't throw words if you don't have some useful to say.
    "Sorry, couldn't resist"
  13. Michael Sprague Says:
    @alex - The :-) meant it was a joke. Apparently you are the only one that didn't get that. If you want to know my real opinion on languages, including PHP, read my most recent post.

    http://www.webtrenches.com/post.cfm/coldfusion-php-rails-asp-net-and-more-what-do-i-choose
  14. leon Says:
    podria dejar un ejemplo de como hacerlo?

    gracias
  15. Andrew Says:
    I don't know why the MySQL manual suggests ORDER BY RAND() as a way to fetch random rows if it is soooo slow... You would think MySQL would automatically optimize this query..

    Anyway, here's the solution I came up with for PHP (after combining a couple of suggestions from some posts I read).

    <?php
    //get number of rows for the query
    list($max) = mysql_fetch_array(mysql_query("SELECT COUNT(*) FROM some_tbl WHERE cond=1 AND cond2=1"));

    //then use php's rand function as the offset
    mysql_query("SELECT * FROM some_tbl WHERE cond=1 AND cond2=1 LIMIT " . rand(0, $max - 1) . ", 1");

    ?>

    Of course, this will fetch just one row, but hopefully you will be able to adjust it to your needs.

    If you need more than 1 row, you could use a union or generate an array of rand values before performing the second query.

    I hope this helps someone!
  16. Kev Says:
    I was curious about the second query you obtain.
    Is it something like (for your 5 rows example) SELECT * FROM mytable WHERE id=1 OR id=4 OR id=3 OR id=6 OR id=9 ? I mean, a long list of " OR id=x " ?
  17. Michael Sprague Says:
    Kev, you could do it that way, but it would be more efficient to use an IN statement like this...

    SELECT * FROM mytable WHERE id IN (3,46,13,12,10)
  18. jake Says:
    Rob:

    $res = mysql_query("SELECT id FROM sweettable");
    $rows = mysql_fetch_assoc($res);

    $rand_row = $rows[ array_rand($rows) ];
    $res = mysql_query("SELECT * FROM sweetttable WHERE id=$rand_row[id]");

    $random_guy = mysql_fetch_assoc($res);
  19. Tim viec lam Says:
    Thanks alot!

    Im really in trouble with order by rand(), terrible performance
  20. wsm Says:
    great article, thanks!
  21. Kev Says:
    Do you use a transaction for this? Someone could delete a record between the two queries.
  22. Mike Says:
    @Kev - yes, you can put this in a <CFTRANSACTION> tag (assuming ColdFusion) if there is the possibility that one of the records would be deleted. Good point.
  23. Bee Says:
    Does this make any difference:

    SELECT *, RAND() as random_field FROM sometable ORDER BY random_field

    Or this is exactly same?
  24. dave Says:
    if you need more that 1 line you can change the limit number
    eg if you have 100 data but need 4 random lines
    rand(0, $max - 4) . ", 4");
    ------------

    <?php
    //get number of rows for the query
    list($max) = mysql_fetch_array(mysql_query("SELECT COUNT(*) FROM some_tbl WHERE cond=1 AND cond2=1"));

    //then use php's rand function as the offset
    mysql_query("SELECT * FROM some_tbl WHERE cond=1 AND cond2=1 LIMIT " . rand(0, $max - 4) . ", 4");

    ?>
  25. richard Says:
    There is nothing wrong with order by RAND(). Just make sure to use correct indexes etc.
  26. Andrew Says:
    @richard Yes there is. Hence this article and discussion. Read the article and try it for yourself. It is super slow because of MySQL's strange way of selecting random rows.
  27. Chuck Says:
    I think the problem is really that you do not have a where clause. So, yes, it will go after the entire table and take a lot of time.

    What you really want here is a random selection of five items from the table, not a random short.

    Let's say you had a selection of 10 items out of a table of 5000 and you wanted to present them shuffled, you'd do the selection with a where clause to get just those ten and add the order by rand() to shuffle them. That's the real purpose of the order by rand().
  28. Michael Sprague Says:
    Chuck - The point of the query is to select 5 completely random records from ALL records in the table. Putting a where clause on it would limit it to the same 5 records every time and just order it differently.
  29. LB Habib Says:
    This is the fastest and most efficient way to query random data from large databases

    list($max) = mysql_fetch_array(mysql_query("SELECT COUNT(*) FROM some_tbl WHERE cond=1 AND cond2=1"));

    mysql_query("SELECT * FROM some_tbl WHERE cond=1 AND cond2=1 LIMIT " . rand(0, $max - 5) . ", 5");

Leave a Reply

Leave this field empty:



Powered by Mango Blog. Design and Icons by N.Design Studio