Web Trenches

Avoid ORDER BY RAND() in MySQL



This is probably old knowledge, but for some reason I never ran into it until this week.  You should almost never use the RAND() function of MySQL in an ORDER BY.

The ORDER BY RAND() operation actually re-queries each row of your table, assigns a random number ID and then delivers the results.  This takes a large amount of processing time for table of more than 500 rows. 

We had a script with a query like this:
SELECT id,title,publishDate FROM mytable ORDER BY RAND() LIMIT 5

This took 1400ms to run because the table had 5000+ records.

We changed it to query for a full list of IDs, then randomly choose 5 IDs using ColdFusion and randrange(), then requeried for just those 5 records.  Even with two queries instead of one, it is MUCH faster – at 15ms total for the database transactions.

So, as a general rule, AVOID USING ORDER BY RAND().   If you have a really small table it is fine, but it does not scale well at all.

 

31 Replies to “Avoid ORDER BY RAND() in MySQL”

  1. Sounds to me that you didn’t have the proper indexing on the table? Can you confirm this? You need an index plan for title, publishDate. Should already be one on id.

  2. Again, just a suggestion. Numeric will always be the fastest to index/randomize.

    How about this?
    SELECT id,title,publishDate FROM mytable WHERE id IN (select t.id FROM mytable t ORDER BY RAND() LIMIT 5)

  3. Todd – The ID column is not numeric, which may be amplifying the problem. It is a UUID. The example you provide still runs between 600 and 800ms.

    Thanks for offering up these solutions. I am sure they will be helpful to others who read through this. I think in my case, though, I am best served by allowing ColdFusion to do the randomizing. It runs MUCH faster and the impact is so small that it is probably not worth adjusting.

  4. Yup, that was silly me for assuming id was numeric since columns generally called ‘id’ are.

    When I create a table these days, I still include a numeric auto-num primary key and my uuid’s are usually in a column named ‘create_uuid’ (which follows my create_dt, create_user, update_dt, update_user scheme). Primary numeric keys are still useful.

  5. This is good to know. I didn’t realize that it had to requery the table to apply the rand() – I just assumed that happend in real time as the records were being collected.

  6. The performance problem doesn’t stem solely from RAND() but from the combination ‘ORDER BY RAND()’. There’s basically nothing wrong with RAND() as it only generates a random number but sorting the WHOLE table by an array of such random numbers is what demands so much processing time.
    Thus, your advice shoouldn’t be “AVOID USING RAND()” but “AVOID USING ORDER BY RAND()”

  7. Good to know but the fix was not very clear. For one I don’t use Cold Fusion and two, posting some exact code would be better (such as PHP a more common language).

  8. @Rob – Sorry, but PHP is for amateurs. 🙂 Sorry, couldn’t resist. You’ll have to look elsewhere for your PHP code.

    Here’s a non-language-specific description of the solution…

    1. Query the database for all distinct IDs and store them in a list, array, or struct.
    SELECT id FROM mytable

    2. Generate 5 (or whatever number) unique numbers between 1 and the length of your list, array, or struct.

    3. Get the items at those 5 positions in your list/array/struct.

    4. Run a query with an IN statement based on those 5 items.
    SELECT * FROM mytable WHERE id IN (3,46,13,12,10)

    It seems like this would be slower, but actually it is MUCH faster.

  9. @Michael Sprague sorry, but if you say PHP is for amateurs then you are a really ignorant and stupid person :-). How’s Coldfusion if php is for amateurs?
    Don’t throw words if you don’t have some useful to say.
    “Sorry, couldn’t resist”

  10. I don’t know why the MySQL manual suggests ORDER BY RAND() as a way to fetch random rows if it is soooo slow… You would think MySQL would automatically optimize this query..

    Anyway, here’s the solution I came up with for PHP (after combining a couple of suggestions from some posts I read).

    < ?php //get number of rows for the query list($max) = mysql_fetch_array(mysql_query("SELECT COUNT(*) FROM some_tbl WHERE cond=1 AND cond2=1")); //then use php's rand function as the offset mysql_query("SELECT * FROM some_tbl WHERE cond=1 AND cond2=1 LIMIT " . rand(0, $max - 1) . ", 1"); ?>

    Of course, this will fetch just one row, but hopefully you will be able to adjust it to your needs.

    If you need more than 1 row, you could use a union or generate an array of rand values before performing the second query.

    I hope this helps someone!

  11. I was curious about the second query you obtain.
    Is it something like (for your 5 rows example) SELECT * FROM mytable WHERE id=1 OR id=4 OR id=3 OR id=6 OR id=9 ? I mean, a long list of ” OR id=x ” ?

  12. Rob:

    $res = mysql_query(“SELECT id FROM sweettable”);
    $rows = mysql_fetch_assoc($res);

    $rand_row = $rows[ array_rand($rows) ];
    $res = mysql_query(“SELECT * FROM sweetttable WHERE id=$rand_row[id]”);

    $random_guy = mysql_fetch_assoc($res);

  13. @Kev – yes, you can put this in a tag (assuming ColdFusion) if there is the possibility that one of the records would be deleted. Good point.

  14. Does this make any difference:

    SELECT *, RAND() as random_field FROM sometable ORDER BY random_field

    Or this is exactly same?

  15. if you need more that 1 line you can change the limit number
    eg if you have 100 data but need 4 random lines
    rand(0, $max – 4) . “, 4”);
    ————

    < ?php //get number of rows for the query list($max) = mysql_fetch_array(mysql_query("SELECT COUNT(*) FROM some_tbl WHERE cond=1 AND cond2=1")); //then use php's rand function as the offset mysql_query("SELECT * FROM some_tbl WHERE cond=1 AND cond2=1 LIMIT " . rand(0, $max - 4) . ", 4"); ?>

  16. @richard Yes there is. Hence this article and discussion. Read the article and try it for yourself. It is super slow because of MySQL’s strange way of selecting random rows.

  17. I think the problem is really that you do not have a where clause. So, yes, it will go after the entire table and take a lot of time.

    What you really want here is a random selection of five items from the table, not a random short.

    Let’s say you had a selection of 10 items out of a table of 5000 and you wanted to present them shuffled, you’d do the selection with a where clause to get just those ten and add the order by rand() to shuffle them. That’s the real purpose of the order by rand().

  18. Chuck – The point of the query is to select 5 completely random records from ALL records in the table. Putting a where clause on it would limit it to the same 5 records every time and just order it differently.

  19. This is the fastest and most efficient way to query random data from large databases

    list($max) = mysql_fetch_array(mysql_query(“SELECT COUNT(*) FROM some_tbl WHERE cond=1 AND cond2=1”));

    mysql_query(“SELECT * FROM some_tbl WHERE cond=1 AND cond2=1 LIMIT ” . rand(0, $max – 5) . “, 5”);

  20. We have done some testing on large client databases and have found it depends on how you are using RAND(). We have seen it work fine on large databases but here is how we recommend doing it.

    1.) Just retrieve only the unique key/ID and not everything * as this speeds things up.
    2.) Your key/ID MUST be numeric/INT or things will be very slow.

    If you keep the above in mind it is possible for RAND() to work very well but otherwise as the OP says things may be impossibly slow.

Leave a Reply

Your email address will not be published. Required fields are marked *