Why Are Taxonomy Lookups Faster than Post Meta Lookups?
So I’ve read all over the place that looking for posts based on a taxonomy is much quicker than looking up posts based on post meta. But why? I’ve yet to read any sort of technical explanation that satisfies my curiosity. I decided to do some research.
I wrote some example code and used the debug plugin to examine the resulting SQL queries. Here is a post meta query:
<code class="language-php">$args = array( 'meta_query' => array( array( 'key' => 'test_key', 'value' => '1' ) ) ); $query = new WP_Query( $args2 );</code>
Here is the resulting SQL:
<code class="language-sql">SELECT SQL_CALC_FOUND_ROWS trunk_posts.ID FROM trunk_posts INNER JOIN trunk_postmeta ON (trunk_posts.ID = trunk_postmeta.post_id) WHERE 1=1 AND trunk_posts.post_type = 'post' AND (trunk_posts.post_status = 'publish' OR trunk_posts.post_status = 'private') AND ( (trunk_postmeta.meta_key = 'test_key' AND CAST(trunk_postmeta.meta_value AS CHAR) = '1') ) GROUP BY trunk_posts.ID ORDER BY trunk_posts.post_date DESC LIMIT 0, 10</code>
As you can see the posts table is being joined with the post meta table. A join very basically creates a temporary table where every row in the posts table is matched with every row in the post meta table. The ON clause then narrows down that temporary table keeping only rows of posts that have been matched with post meta that apply to that post. Here is a database model of the post and post meta tables:
This joining of posts with corresponding post meta rows is done by comparing values across two indexed columns (trunk_posts.ID and trunk_postmeta.post_id) as shown in the diagram. Creating an index on a column in MySQL makes lookups on the column much quicker. Simply put, MySQL stores a B-Tree of the values in that column. A B-Tree is a data structure that allows search in O(log n) time (vs. O(n) time on an unindexed column). The disadvantage in indexing a column with a B-Tree type data structure is that inserting and deleting becomes much slower because parts of the tree must be rebuilt; this is generally a worthwhile sacrifice.
Next, the WHERE statement narrows down posts by meta key only keeping rows that have meta_key=’test_key’. Again, as shown in the diagram, trunk_postmeta.meta_key is an indexed column. So again we can do this in O(log n) time. Awesome, so where is the slow down? Well, the last thing we have to do is narrow the posts so we only have ones where meta_value=’1′. There is no index on the meta_value column. We have refined our posts as follows:
- posts matching with every post meta row
- posts matching with only post meta that applies to that post
- posts matching with post meta for that post with key ‘test_key’
- posts matching with post meta for that post with key ‘test_key’ and value ‘1’.
On the final refinement, if there are multiple post meta rows with the same key associated, then we will have to search through multiple values until we find a matching value. This search will happen in O(n) which is slow.
Now let’s look at how a taxonomy query works. Here is my taxonomy query code:
<code class="language-php">$args = array( 'tax_query' => array( array( 'taxonomy' => 'category', 'field' => 'id', 'terms' => array( 2 ), ) ) ); $query = new WP_Query( $args );</code>
Here is the resulting SQL:
<code class="language-sql">SELECT SQL_CALC_FOUND_ROWS trunk_posts.ID FROM trunk_posts INNER JOIN trunk_term_relationships ON (trunk_posts.ID = trunk_term_relationships.object_id) WHERE 1=1 AND ( trunk_term_relationships.term_taxonomy_id IN (2) ) AND trunk_posts.post_type = 'post' AND (trunk_posts.post_status = 'publish' OR trunk_posts.post_status = 'private') GROUP BY trunk_posts.ID ORDER BY trunk_posts.post_date DESC LIMIT 0, 10 </code>
As you can see the trunk_posts table is joined with the trunk_term_relationships table. The two tables are joined into a temporary table which is refined by the ON portion of the join statement matching each term row with it’s corresponding posts.
As shown in the diagram there is an index on trunk_posts.ID and trunk_term_relationships.object_id, so this first refinement can be done in O(log n) time. Next the WHERE clause refines the temporary table so that only posts matched with the term id 2 are left. There is an index on the term_taxonomy_id column so this can be done in O(log n) time. Obviously, there is more to the query then just this but we don’t care about anything else for now.
Post meta queries will experience a major slowdown if the key(s) being searched are associated with a large amount of posts. Another big reason to avoid post meta queries is because the post meta table is typically much larger than the taxonomy_term_relationships table. Therefore doing a join with posts and postmeta is a much more expensive operation than joining posts and taxonomy_term_relationships. This analysis doesn’t take into account caching and is drastically simplified. If you have any questions or corrections, please leave a comment.