10 MySQL Best Practices for Optimization

Posted on

1. LIMIT 1 When Getting a Unique Row in your table

Sometimes when you are querying your tables, you already know you are looking for just one row. You might be fetching a unique record, or you might just be just checking the existence of any number of records that satisfy your WHERE clause.

In such cases, adding LIMIT 1 to your query can increase performance.
// do I have people from California? // NOT to Do!: $r = mysql_query("SELECT * FROM user WHERE state = 'California'"); if (mysql_num_rows($r) > 0) { // ... other code } // Good request $r = mysql_query("SELECT 1 FROM user WHERE state = 'California' LIMIT 1"); if (mysql_num_rows($r) > 0) { // ... other code }

2. Optimize Queries For the Query Cache

Most MySQL servers have query caching enabled. It's one of the most effective methods of improving performance, that is quietly handled by the database engine. When the same query is executed multiple times, the result is fetched from the cache, which is quite fast.
// query cache does NOT work $r = mysql_query("SELECT username FROM user WHERE signup_date >= CURDATE()"); // cache works! $today_date = date("Y-m-d"); $r = mysql_query("SELECT username FROM user WHERE signup_date >= '$today_date'");

3. Index the Search Fields

Indexes are not just for the primary keys or the unique keys. If there are any columns in your table that you will search by, you should almost always index them.

As you can see, this rule also applies on a partial string search like "last_name LIKE 'a%'". When searching from the beginning of the string, MySQL is able to utilize the index on that column.

You should also understand which kinds of searches can not use the regular indexes. For instance, when searching for a word (e.g. "WHERE post_content LIKE '%tomato%'"), you will not see a benefit from a normal index. You will be better off using mysql fulltext search or building your own indexing solution.

4. Index and Use Same Column Types for Joins

If your application contains many JOIN queries, you need to make sure that the columns you join by are indexed on both tables. This affects how MySQL internally optimizes the join operation.

Also, the columns that are joined, need to be the same type. For instance, if you join a DECIMAL column, to an INT column from another table, MySQL will be unable to use at least one of the indexes. Even the character encodings need to be the same type for string type columns.
// looking for companies in my state $r = mysql_query("SELECT company_name FROM users LEFT JOIN companies ON (users.state = companies.state) WHERE users.id = $user_id"); // both state columns should be indexed // and they both should be the same type and character encoding // or MySQL might do full table scans

5. Avoid your SELECT * requests

The more data is read from the tables, the slower the query will become. It increases the time it takes for the disk operations. Also when the database server is separate from the web server, you will have longer network delays due to the data having to be transferred between the servers.
// not preferred $r = mysql_query("SELECT * FROM user WHERE user_id = 1"); $d = mysql_fetch_assoc($r); echo "Welcome {$d['username']}"; // better code: $r = mysql_query("SELECT username FROM user WHERE user_id = 1"); $d = mysql_fetch_assoc($r); echo "Welcome {$d['username']}";

6. Please, Do Not ORDER BY RAND()

This is one of those tricks that sound cool at first, and many rookie programmers fall for this trap. You may not realize what kind of terrible bottleneck you can create once you start using this in your queries.

If you really need random rows out of your results, there are much better ways of doing it. Granted it takes additional code, but you will prevent a bottleneck that gets exponentially worse as your data grows. The problem is, MySQL will have to perform RAND() operation (which takes processing power) for every single row in the table before sorting it and giving you just 1 row.
// what NOT to do: $r = mysql_query("SELECT username FROM user ORDER BY RAND() LIMIT 1"); // much better: $r = mysql_query("SELECT count(*) FROM user"); $d = mysql_fetch_row($r); $rand = mt_rand(0,$d[0] - 1); $r = mysql_query("SELECT username FROM user LIMIT $rand, 1");

So you pick a random number less than the number of results and use that as the offset in your LIMIT clause.

7. Use ENUM over VARCHAR

ENUM type columns are very fast and compact. Internally they are stored like TINYINT, yet they can contain and display string values. This makes them a perfect candidate for certain fields.

If you have a field, which will contain only a few different kinds of values, use ENUM instead of VARCHAR. For example, it could be a column named "status", and only contain values such as "active", "inactive", "pending", "expired" etc...

There is even a way to get a "suggestion" from MySQL itself on how to restructure your table. When you do have a VARCHAR field, it can actually suggest you to change that column type to ENUM instead. This done using the PROCEDURE ANALYSE() call.

Store IP Addresses as UNSIGNED INT Many programmers will create a VARCHAR(15) field without realizing they can actually store IP addresses as integer values. With an INT you go down to only 4 bytes of space, and have a fixed size field instead.

You have to make sure your column is an UNSIGNED INT, because IP Addresses use the whole range of a 32 bit unsigned integer.

In your queries you can use the INET_ATON() to convert and IP to an integer, and INET_NTOA() for vice versa. There are also similar functions in PHP long2ip() and called ip2long().
$r = "UPDATE usrs SET ip = INET_ATON('{$_SERVER['REMOTE_ADDR']}') WHERE user_id = $u_id";

8. Vertical Partitioning

Vertical Partitioning is the act of splitting your table structure in a vertical manner for optimization reasons.

Example 1: You might have a users table that contains home addresses, that do not get read often. You can choose to split your table and store the address info on a separate table. This way your main users table will shrink in size. As you know, smaller tables perform faster.

Example 2: You have a "last_login" field in your table. It updates every time a user logs in to the website. But every update on a table causes the query cache for that table to be flushed. You can put that field into another table to keep updates to your users table to a minimum.

But you also need to make sure you don't constantly need to join these 2 tables after the partitioning or you might actually suffer performance decline.

9. Smaller Columns - Are Faster

With database engines, disk is perhaps the most significant bottleneck. Keeping things smaller and more compact is usually helpful in terms of performance, to reduce the amount of disk transfer.

MySQL docs have a list of Storage Requirements for all data types. If a table is expected to have very few rows, there is no reason to make the primary key an INT, instead of MEDIUMINT, SMALLINT or even in some cases TINYINT. If you do not need the time component, use DATE instead of DATETIME.

qJust make sure you leave reasonable room to grow or you might end up like Slashdot.

10. Choose the Right Engine

The two main storage engines in MySQL are MyISAM and InnoDB. Each have their own pros and cons.

MyISAM is good for read-heavy applications, but it doesn't scale very well when there are a lot of writes. Even if you are updating one field of one row, the whole table gets locked, and no other process can even read from it until that query is finished. MyISAM is very fast at calculating SELECT COUNT(*) types of queries.

InnoDB tends to be a more complicated storage engine and can be slower than MyISAM for most small applications. But it supports row-based locking, which scales better. It also supports some more advanced features such as transactions.

Free Hit Counters