The get_post_class()
function is a WordPress function commonly used within post “rivers”. For example, if I had a list of posts, WooCommerce products, or any content type really, I might have some code like this:
<div >
The get_post_class()
function is a WordPress function commonly used within post “rivers”. For example, if I had a list of posts, WooCommerce products, or any content type really, I might have some code like this:
<div >
Note: post_class()
just calls get_post_class()
and outputs it to the browser.
post_class()
will output something like class="post has-post-thumbnail type-POST-TYPE status-POST_STATUS tag-TAG1 tag-TAG2 category-CATEGORY1 category-CATEGORY2 ...."
The classes added make it easy to style content that has a specific taxonomy term, has a thumbnail, a particular status, etc.
However, the queries needed to determine all this information are not cheap. Moreover, this function is probably called for every post you’re listing. So if you have posts_per_page
set to 20, this function will be called 20 times.
Let’s take a look at the function’s code (I’ve trimmed some of the comments):
function get_post_class( $class = '', $post_id = null ) { $post = get_post( $post_id ); $classes = array(); if ( $class ) { if ( ! is_array( $class ) ) { $class = preg_split( '#s+#', $class ); } $classes = array_map( 'esc_attr', $class ); } else { // Ensure that we always coerce class to being an array. $class = array(); } if ( ! $post ) { return $classes; } $classes[] = 'post-' . $post->ID; if ( ! is_admin() ) $classes[] = $post->post_type; $classes[] = 'type-' . $post->post_type; $classes[] = 'status-' . $post->post_status; // Post Format if ( post_type_supports( $post->post_type, 'post-formats' ) ) { $post_format = get_post_format( $post->ID ); if ( $post_format && !is_wp_error($post_format) ) $classes[] = 'format-' . sanitize_html_class( $post_format ); else $classes[] = 'format-standard'; } $post_password_required = post_password_required( $post->ID ); // Post requires password. if ( $post_password_required ) { $classes[] = 'post-password-required'; } elseif ( ! empty( $post->post_password ) ) { $classes[] = 'post-password-protected'; } // Post thumbnails. if ( current_theme_supports( 'post-thumbnails' ) && has_post_thumbnail( $post->ID ) && ! is_attachment( $post ) && ! $post_password_required ) { $classes[] = 'has-post-thumbnail'; } // sticky for Sticky Posts if ( is_sticky( $post->ID ) ) { if ( is_home() && ! is_paged() ) { $classes[] = 'sticky'; } elseif ( is_admin() ) { $classes[] = 'status-sticky'; } } // hentry for hAtom compliance $classes[] = 'hentry'; // All public taxonomies $taxonomies = get_taxonomies( array( 'public' => true ) ); foreach ( (array) $taxonomies as $taxonomy ) { if ( is_object_in_taxonomy( $post->post_type, $taxonomy ) ) { foreach ( (array) get_the_terms( $post->ID, $taxonomy ) as $term ) { if ( empty( $term->slug ) ) { continue; } $term_class = sanitize_html_class( $term->slug, $term->term_id ); if ( is_numeric( $term_class ) || ! trim( $term_class, '-' ) ) { $term_class = $term->term_id; } // 'post_tag' uses the 'tag' prefix for backward compatibility. if ( 'post_tag' == $taxonomy ) { $classes[] = 'tag-' . $term_class; } else { $classes[] = sanitize_html_class( $taxonomy . '-' . $term_class, $taxonomy . '-' . $term->term_id ); } } } } $classes = array_map( 'esc_attr', $classes ); $classes = apply_filters( 'post_class', $classes, $class, $post->ID ); return array_unique( $classes ); }
Within this code, the following functions might result in database queries: get_post_format
, has_post_thumbnail
, is_sticky
, and get_the_terms
. The most expensive of these queries is get_the_terms
which for each taxonomy associated with the post type, selects all the terms attached to the post for that taxonomy. If there are four taxonomies associated with the post type being queried, get_post_class
could result in 7 extra database queries per post. With 20 posts per page, that’s an extra 140 queries per page load! On WooCommerce sites where there are many taxonomies and usually many products per page being shown, this is a huge performance killer. Yes, object caching (and page caching of course) will improve our eliminate some of the database queries, but people will still be hitting the cache cold sometimes.
Solution:
Don’t use get_post_class
or post_class
. It’s not that important. 99% of people don’t use the tags it generates. What I do is output the function, inspect the classes it adds using Chrome, and hardcode the classes actually referenced in CSS into the theme.
PS: body_class()
is much less query intensive and okay to use.
ElasticPress is a 10up WordPress plugin project that integrates Elasticsearch with WordPress. As we all know search in WordPress is not a great experience. Why? Well, MySQL is not a database optimized for search. Thus ElasticPress was born.
1. Search result relevancy scores on sites with high post to shard ratios can vary depending on order of indexing.
We first noticed this in our integration testing suite. We were using three shards across 1 primary node. Depending on the order that posts were indexed, different relevancy scores were returned for the same search.
Elasticsearch relevancy scores are calculated as term frequency / inverse document frequency
. Term frequency is the number of times a term appears in the query field of the current document (or post). Inverse document frequency measures how often the term appears in all query fields across all documents in the index of the current shard. Notice I said shard NOT index. The shard a post lives on is determined by the number of shards and the size of the index. We can’t exactly predict relevancy scores for a search on an index across more than one shard. The Elasticsearch documentation has a great article on this.
The solution for testing purposes is to only use one shard. In the real world, this shouldn’t matter as inconsistencies plateau as index sizes grow larger. However, this is still something to be aware of.
2. There is no right search algorithm for WordPress. Fine tuning algorithms is an on-going, collaborative process.
As of ElasticPress 1.1, the meat of our default search query looked like this:
{ "query": { "bool": { "must": { "fuzzy_like_this": { "fields": [ "post_title", "post_excerpt", "post_content" ], "like_text": "search phrase", "min_similarity": 0.75 } } } } }
fuzzy_like_this
is great. It combines fuzzy
and more_like_this
queries. fuzzy
searches against a set of fuzzified terms (using the levenshtein distance algorithm). more_like_this
selects “interesting” terms based on a number of factors like document frequency and checks each document against those terms.
The problem we encountered was that in certain established indexes exact matches were not getting boosted to the very top of results. This was due to the way the fuzzy_like_this
algorithm works. We added an extra query to our search algorithm in 1.2 to boost exact matches:
{ "query": { "bool": { "should": [ { "multi_match": { "query": "search phrase", "boost": 2, "fields": ["post_title", "post_content", "post_excerpt"] } }, { "fuzzy_like_this": { "fields": ["post_title", "post_excerpt", "post_content"], "like_text": "search phrase", "min_similarity": 0.75 } } ] } } }
The should
query tells Elasticsearch that one of the multi_match
or fuzzy_like_this
queries must be true for a document to match. It then boosts anything found multi_match
x2.
This solved our immediate problem but is not the perfect algorithm. We expect to continually optimize this for WordPress over time. (Note that ElasticPress allows you to filter the search query entirely if you want to customize it.)
3. Disable indexing during imports.
By default ElasticPress indexes when a post is created. This is great until you try to import a few thousand posts, and your Elasticsearch instance gets overloaded. This bit us pretty hard. As of newer versions, ElasticPress disables syncing during WordPress imports big or small.
So I’ve read all over the place that looking for posts based on a taxonomy is much quicker than looking up posts based on post meta. But why? I’ve yet to read any sort of technical explanation that satisfies my curiosity. I decided to do some research.
I wrote some example code and used the debug plugin to examine the resulting SQL queries. Here is a post meta query:
$args = array(
'meta_query' =>
array(
array(
'key' => 'test_key',
'value' => '1'
)
)
);
$query = new WP_Query( $args2 );
Here is the resulting SQL:
SELECT SQL_CALC_FOUND_ROWS trunk_posts.ID
FROM trunk_posts INNER JOIN trunk_postmeta ON (trunk_posts.ID = trunk_postmeta.post_id)
WHERE 1=1 AND
trunk_posts.post_type = 'post' AND
(trunk_posts.post_status = 'publish' OR trunk_posts.post_status = 'private') AND
( (trunk_postmeta.meta_key = 'test_key' AND
CAST(trunk_postmeta.meta_value AS CHAR) = '1') )
GROUP BY trunk_posts.ID
ORDER BY trunk_posts.post_date DESC
LIMIT 0, 10
As you can see the posts table is being joined with the post meta table. A join very basically creates a temporary table where every row in the posts table is matched with every row in the post meta table. The ON clause then narrows down that temporary table keeping only rows of posts that have been matched with post meta that apply to that post. Here is a database model of the post and post meta tables:
This joining of posts with corresponding post meta rows is done by comparing values across two indexed columns (trunk_posts.ID and trunk_postmeta.post_id) as shown in the diagram. Creating an index on a column in MySQL makes lookups on the column much quicker. Simply put, MySQL stores a B-Tree of the values in that column. A B-Tree is a data structure that allows search in O(log n) time (vs. O(n) time on an unindexed column). The disadvantage in indexing a column with a B-Tree type data structure is that inserting and deleting becomes much slower because parts of the tree must be rebuilt; this is generally a worthwhile sacrifice.
Next, the WHERE statement narrows down posts by meta key only keeping rows that have meta_key=’test_key’. Again, as shown in the diagram, trunk_postmeta.meta_key is an indexed column. So again we can do this in O(log n) time. Awesome, so where is the slow down? Well, the last thing we have to do is narrow the posts so we only have ones where meta_value=’1′. There is no index on the meta_value column. We have refined our posts as follows:
On the final refinement, if there are multiple post meta rows with the same key associated, then we will have to search through multiple values until we find a matching value. This search will happen in O(n) which is slow.
Now let’s look at how a taxonomy query works. Here is my taxonomy query code:
$args = array(
'tax_query' => array(
array(
'taxonomy' => 'category',
'field' => 'id',
'terms' => array( 2 ),
)
)
);
$query = new WP_Query( $args );
Here is the resulting SQL:
SELECT SQL_CALC_FOUND_ROWS trunk_posts.ID
FROM trunk_posts INNER JOIN trunk_term_relationships ON (trunk_posts.ID = trunk_term_relationships.object_id)
WHERE 1=1 AND
( trunk_term_relationships.term_taxonomy_id IN (2) ) AND
trunk_posts.post_type = 'post' AND
(trunk_posts.post_status = 'publish' OR trunk_posts.post_status = 'private')
GROUP BY trunk_posts.ID
ORDER BY trunk_posts.post_date DESC LIMIT 0, 10
As you can see the trunk_posts table is joined with the trunk_term_relationships table. The two tables are joined into a temporary table which is refined by the ON portion of the join statement matching each term row with it’s corresponding posts.
As shown in the diagram there is an index on trunk_posts.ID and trunk_term_relationships.object_id, so this first refinement can be done in O(log n) time. Next the WHERE clause refines the temporary table so that only posts matched with the term id 2 are left. There is an index on the term_taxonomy_id column so this can be done in O(log n) time. Obviously, there is more to the query then just this but we don’t care about anything else for now.
Summary:
Post meta queries will experience a major slowdown if the key(s) being searched are associated with a large amount of posts. Another big reason to avoid post meta queries is because the post meta table is typically much larger than the taxonomy_term_relationships table. Therefore doing a join with posts and postmeta is a much more expensive operation than joining posts and taxonomy_term_relationships. This analysis doesn’t take into account caching and is drastically simplified. If you have any questions or corrections, please leave a comment.