Elasticsearch, Isomorphic JavaScript, Presentations, Search

ZendCon 2017

I'm excited to once again be presenting at ZendCon 2017. This year I'll be doing two talks.

On Tuesday, October 24, I'll be presenting "Isomorphic WordPress Applications using NodeifyWP" which will cover isomorphic JavaScript in WordPress, specifically NodifyWP, Twenty Sixteen React, and the NodifyWP Environment. Here are my slides.

On Thursday, October 26, I’ll be presenting “Transforming WordPress Search and Query Performance with Elasticsearch “. This talk will cover Elasticsearch, ElasticPress and WordPress. Here are my slides:

Standard
Elasticsearch, WordCamps

ElasticPress at WordCamp Miami and WordCamp Belo Horizonte 2015

On May 30 I am presenting at WordCamp Miami 2015. My session is titled “Modernizing WordPress Search with Elasticsearch”. I have revamped the talk a bit to accommodate less technical users. Rather than running through Elasticsearch configuration, I am talking more about example ElasticPress queries. I’ll be doing the same talk June 13 at WordCamp Belo Horizonte.

Here are my slides for the talk:

Don’t forget that comprehensive documentation for ElasticPress lives on Github.

Standard
Database Theory, Elasticsearch, Search, WordCamps, WordPress Plugins

ElasticPress at WordCamp Paris

This weekend I presented at WordCamp Paris 2015. My session was titled “Modernizing WordPress Search with Elasticsearch”. The talk ran through issues with WordPress search, what Elasticsearch is, setting up an Elasticsearch cluster, and configuring ElasticPress.

Elasticsearch is a very exciting technology and I am thrilled at the chance to spread information about it. I (and 10up in general) am very proud of the work we have done on ElasticPress. My hope is that more people will install the plugin and give us feedback as a result of the talk.

Here are my slides for the talk:

Don’t forget that comprehensive documentation for ElasticPress lives on Github.

Standard
Elasticsearch, Search, WordPress Plugins

Valuable Lessons Learned in ElasticPress

ElasticPress is a 10up WordPress plugin project that integrates Elasticsearch with WordPress. As we all know search in WordPress is not a great experience. Why? Well, MySQL is not a database optimized for search. Thus ElasticPress was born.

1. Search result relevancy scores on sites with high post to shard ratios can vary depending on order of indexing.

We first noticed this in our integration testing suite. We were using three shards across 1 primary node. Depending on the order that posts were indexed, different relevancy scores were returned for the same search.

Elasticsearch relevancy scores are calculated as term frequency / inverse document frequency. Term frequency is the number of times a term appears in the query field of the current document (or post). Inverse document frequency measures how often the term appears in all query fields across all documents in the index of the current shard. Notice I said shard NOT index. The shard a post lives on is determined by the number of shards and the size of the index. We can’t exactly predict relevancy scores for a search on an index across more than one shard. The Elasticsearch documentation has a great article on this.

The solution for testing purposes is to only use one shard. In the real world, this shouldn’t matter as inconsistencies plateau as index sizes grow larger. However, this is still something to be aware of.

2. There is no right search algorithm for WordPress. Fine tuning algorithms is an on-going, collaborative process.

As of ElasticPress 1.1, the meat of our default search query looked like this:

{
  "query": {
    "bool": {
      "must": {
        "fuzzy_like_this": {
          "fields": [
            "post_title",
            "post_excerpt",
            "post_content"
          ],
          "like_text": "search phrase",
          "min_similarity": 0.75
        }
      }
    }
  }
}

fuzzy_like_this is great. It combines fuzzy and more_like_this queries. fuzzy searches against a set of fuzzified terms (using the levenshtein distance algorithm). more_like_this selects “interesting” terms based on a number of factors like document frequency and checks each document against those terms.

The problem we encountered was that in certain established indexes exact matches were not getting boosted to the very top of results. This was due to the way the fuzzy_like_this algorithm works. We added an extra query to our search algorithm in 1.2 to boost exact matches:

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "search phrase",
            "boost": 2,
            "fields": ["post_title", "post_content", "post_excerpt"]
          }
        },
        {
          "fuzzy_like_this": {
            "fields": ["post_title", "post_excerpt", "post_content"],
            "like_text": "search phrase",
            "min_similarity": 0.75
          }
        }
      ]
    }
  }
}

The should query tells Elasticsearch that one of the multi_match or fuzzy_like_this queries must be true for a document to match. It then boosts anything found multi_match x2.

This solved our immediate problem but is not the perfect algorithm. We expect to continually optimize this for WordPress over time. (Note that ElasticPress allows you to filter the search query entirely if you want to customize it.)

3. Disable indexing during imports.

By default ElasticPress indexes when a post is created. This is great until you try to import a few thousand posts, and your Elasticsearch instance gets overloaded. This bit us pretty hard. As of newer versions, ElasticPress disables syncing during WordPress imports big or small.

Standard