Elasticsearch, WordCamps

ElasticPress at WordCamp Miami and WordCamp Belo Horizonte 2015

On May 30 I am presenting at WordCamp Miami 2015. My session is titled “Modernizing WordPress Search with Elasticsearch”. I have revamped the talk a bit to accommodate less technical users. Rather than running through Elasticsearch configuration, I am talking more about example ElasticPress queries. I’ll be doing the same talk June 13 at WordCamp Belo Horizonte.

Here are my slides for the talk:

Don’t forget that comprehensive documentation for ElasticPress lives on Github.

Database Theory, Elasticsearch, Search, WordCamps, WordPress Plugins

ElasticPress at WordCamp Paris

This weekend I presented at WordCamp Paris 2015. My session was titled “Modernizing WordPress Search with Elasticsearch”. The talk ran through issues with WordPress search, what Elasticsearch is, setting up an Elasticsearch cluster, and configuring ElasticPress.

Elasticsearch is a very exciting technology and I am thrilled at the chance to spread information about it. I (and 10up in general) am very proud of the work we have done on ElasticPress. My hope is that more people will install the plugin and give us feedback as a result of the talk.

Here are my slides for the talk:

Don’t forget that comprehensive documentation for ElasticPress lives on Github.

Elasticsearch, Search, WordPress Plugins

Valuable Lessons Learned in ElasticPress

ElasticPress is a 10up WordPress plugin project that integrates Elasticsearch with WordPress. As we all know search in WordPress is not a great experience. Why? Well, MySQL is not a database optimized for search. Thus ElasticPress was born.

1. Search result relevancy scores on sites with high post to shard ratios can vary depending on order of indexing.

We first noticed this in our integration testing suite. We were using three shards across 1 primary node. Depending on the order that posts were indexed, different relevancy scores were returned for the same search.

Elasticsearch relevancy scores are calculated as term frequency / inverse document frequency. Term frequency is the number of times a term appears in the query field of the current document (or post). Inverse document frequency measures how often the term appears in all query fields across all documents in the index of the current shard. Notice I said shard NOT index. The shard a post lives on is determined by the number of shards and the size of the index. We can’t exactly predict relevancy scores for a search on an index across more than one shard. The Elasticsearch documentation has a great article on this.

The solution for testing purposes is to only use one shard. In the real world, this shouldn’t matter as inconsistencies plateau as index sizes grow larger. However, this is still something to be aware of.

2. There is no right search algorithm for WordPress. Fine tuning algorithms is an on-going, collaborative process.

As of ElasticPress 1.1, the meat of our default search query looked like this:

{
  "query": {
    "bool": {
      "must": {
        "fuzzy_like_this": {
          "fields": [
            "post_title",
            "post_excerpt",
            "post_content"
          ],
          "like_text": "search phrase",
          "min_similarity": 0.75
        }
      }
    }
  }
}

fuzzy_like_this is great. It combines fuzzy and more_like_this queries. fuzzy searches against a set of fuzzified terms (using the levenshtein distance algorithm). more_like_this selects “interesting” terms based on a number of factors like document frequency and checks each document against those terms.

The problem we encountered was that in certain established indexes exact matches were not getting boosted to the very top of results. This was due to the way the fuzzy_like_this algorithm works. We added an extra query to our search algorithm in 1.2 to boost exact matches:

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "search phrase",
            "boost": 2,
            "fields": ["post_title", "post_content", "post_excerpt"]
          }
        },
        {
          "fuzzy_like_this": {
            "fields": ["post_title", "post_excerpt", "post_content"],
            "like_text": "search phrase",
            "min_similarity": 0.75
          }
        }
      ]
    }
  }
}

The should query tells Elasticsearch that one of the multi_match or fuzzy_like_this queries must be true for a document to match. It then boosts anything found multi_match x2.

This solved our immediate problem but is not the perfect algorithm. We expect to continually optimize this for WordPress over time. (Note that ElasticPress allows you to filter the search query entirely if you want to customize it.)

3. Disable indexing during imports.

By default ElasticPress indexes when a post is created. This is great until you try to import a few thousand posts, and your Elasticsearch instance gets overloaded. This bit us pretty hard. As of newer versions, ElasticPress disables syncing during WordPress imports big or small.

Command Line WordPress, Unit Testing, WP API

Using the WordPress JSON REST API in Testing Suites with Travis CI

Problem:

We are writing custom endpoints and routes that extend WP API in a theme or plugin. We want to write tests for this code and run tests in Travis CI. Right now, WP API is not in WordPress core and must be included as a plugin. Therefore, we have a WordPress plugin/theme that depends on another plugin. We want to make sure all our dependencies are installed/included when we bootstrap our testing suite. WP API is not a registered Composer package.

Bad Solution #1:

Complain about WP API not being a registered Composer package. Just joking. The reason behind this is that WP API must remain backwards compatible after being included in WordPress core.

Bad Solution #2:

Include WP API in your theme/plugin as a Git submodule. Not only are Git submodules annoying but this doesn’t even make sense. Git submodules point to a specific commit. We want to just install the latest stable version which isn’t possible using submodules unless we want to manually update our submodule commit hash every time WP API is updated.

Final Solution

We need to manually include WP API in our plugin/theme but we want it up-to-date. We want to pretend WP API is a Composer package. If it’s not already there, create a folder vendor/ in the root of your project. This is the same folder that Composer will store your packages. Add vendor/ to your .gitignore since we don’t need this code in production (assuming you don’t have other packages in vendor that are needed in production).

We can execute bash scripts whenever Composer install/update is called. Let’s write a simple Bash script to clone/update WP API into vendor/. Place this code in bin/install-wp-api.sh:

cloneOutput=$(git clone https://github.com/WP-API/WP-API.git ./vendor/wp-api 2>&1)

if [[ $cloneOutput =~ "destination path './vendor/wp-api' already exists" ]]; then
  cd vendor/wp-api
  git reset --hard &>/dev/null
  git checkout master &>/dev/null
  git reset --hard &>/dev/null
  git pull origin master &>/dev/null
fi

This script will clone WP API, if it doesn’t exist. Otherwise it will checkout the master branch and pull the latest from Github.

Now we need to execute this script. We will use Composer scripts. Here is an example composer.json file:

{
    "name": "tlovett1/package-name",
    "description": "Description of my package.",
    "license": "MIT",
    "authors": [
        {
            "name": "Taylor Lovett",
            "email": "email@email.com"
        }
    ],
    "minimum-stability": "stable",
    "require": {},
    "scripts": {
        "pre-install-cmd": [
            "./bin/install-wp-api.sh"
        ],
        "pre-update-cmd": [
            "./bin/install-wp-api.sh"
        ]
    }
}

composer.json defines your package as a Composer package, registers it’s dependancies, and more. As I mentioned before, we can call scripts as well. We use the pre-install-cmd hook to execute our script every time composer install is called. We use the pre-update-cmd hook to execute our script every time composer update is called. On both of these hooks we are executing our script at ./bin/install-wp-api.sh.

Now we simply add the following to our .travis.yml file:

before_script:
   - composer install

In your test suite bootstrap file, you can insert the following PHP to manually use WP API assuming you are using the WordPress standard unit testing suite:

function _manually_load_plugin() {
	require( dirname( __FILE__ ) . '/../vendor/wp-api/plugin.php' );
}
tests_add_filter( 'muplugins_loaded', '_manually_load_plugin' );

EDIT: Another solution which is easier than this one is using a composer inline package.

Command Line WordPress, Node.js, WordPress for Enterprise

Run WordPress Cron on Real Unix Cron with Node.js

WordPress cron is a confusing beast. Most people don’t understand it or intentionally use it. It is not the same technology as Unix cron. Scheduled post functionality actually depends on WordPress cron. By default on every page load, WordPress checks to see if any cron events are due to fire. If an event is due, it sends a request to wp-cron.php asynchronously to execute the event(s).

So what?

This system works great for small websites running simple theme and plugin setups. Often when building WordPress applications for enterprise, enough resource intensive events get setup on cron that HTTP requests timeout before completion. We can circumvent this problem by executing WordPress cron using actual cron.

wp-cron-node is a simple node command that runs WordPress cron events via PHP CLI and Unix cron. It will execute WordPress cron events that are due for execution. The command requires WP-CLI.

Let’s run our scheduled events using actual Unix cron and Node:

  1. Make sure npm and WP-CLI are installed.
  2. Run the following command:
    npm install -g wp-cron-node
  3. Disable WordPress cron. This will prevent HTTP requests from triggering cron events. Put this code in wp-config.php:
    define( 'DISABLE_WP_CRON', true );
  4. Finally, setup your crontab file to your liking. Edit your crontab with the following command:
    crontab -e

    Here is an example crontab entry that will check for scheduled events every 10 minutes.

    */10 * * * * wp-cron-node /path/to/wp

    Note: Running this every 10 minutes means cron events could fire up to 10 minutes late. Ten is a conservative number for performance reasons. Change this to whatever makes you comfortable.

Backbone, WordPress Code Techniques

Syncing Backbone Models and Collections to admin-ajax.php

Backbone is framed around the assumption that your models and collections are backed by RESTful API’s. If this isn’t the case, life becomes difficult. WordPress provides an AJAX API for backend AJAX requests. These requests are sent through admin-ajax.php which is not RESTful.

I’ve written some code to force Backbone models and collections to play nicely with admin-ajax.php. First I’ll show code, then provide explanation:

/**
 * A mixin for collections/models
 */
var adminAjaxSyncableMixin = {
	url: ajaxurl,

	sync: function( method, object, options ) {
		if ( typeof options.data === 'undefined' ) {
			options.data = {};
		}

		options.data.nonce = localizedSettings.nonce;
		options.data.action_type = method;
		options.data.action = 'myaction';

		var json = this.toJSON();
		var formattedJSON = {};

		if ( json instanceof Array ) {
			formattedJSON.models = json;
		} else {
			formattedJSON.model = json;
		}

		_.extend( options.data, formattedJSON );

		options.emulateJSON = true;

		return Backbone.sync.call( this, 'create', object, options );
	}
};

/**
 * A model for all your syncable models to extend
 */
var BaseModel = Backbone.Model.extend( _.defaults( {
	parse: function( response ) {
		// Implement me depending on your response from admin-ajax.php!

		return response
	}

}, adminAjaxSyncableMixin ));

/**
 * A collection for all your syncable collections to extend
 */
var BaseCollection = Backbone.Collection.extend( _.defaults( {
	parse: function( response ) {
		// Implement me depending on your response from admin-ajax.php!

		return response
	}

}, adminAjaxSyncableMixin ));

The bulk of the action happens in our mixin:

var adminAjaxSyncableMixin = {
	url: ajaxurl,

	sync: function( method, object, options ) {}
};

We will extend the defining objects passed to our base model and collection. Therefore BaseCollection and BaseModel will have this objects properties (unless they are overwritten — more on that later).

The url property defines the location to which syncing will occur. In this case we are using ajaxurl which is a variable localized from WordPress by default containing a full URL to admin-ajax.php.

The sync property defines a function that will be called in front of Backbone.sync. Backbone.sync is called whenever we call Backbone.Model.save, Backbone.Model.destroy, Backbone.Model.fetch, Backbone.Collection.save, and Backbone.Collection.fetch. By providing a sync property to our base model and collection, we are forcing our sync function to be called instead of Backbone.sync.

adminAjaxSyncableMixin.sync has a few parameters:

sync: function( method, object, options ) {

}

By default, method is set by the function called. For example, calling Backbone.Model.save or Backbone.Collection.save will call Backbone.sync where method is create or patch (possibly update). method ultimately defines the type of HTTP method used (GET, POST, DELETE, PATCH, PUT, or OPTIONS). object is the model or collection object being synced. options lets us send associative arguments to our sync destination among other things.

Let’s look at the body of this function.

if ( typeof options.data === 'undefined' ) {
	options.data = {};
}

Setting the data property in options lets us manually overwrite the representational data sent to the server. We are going to use this so we make sure it’s defined, if it isn’t.

options.data.nonce = localizedSettings.nonce;
options.data.action_type = method;
options.data.action = 'myaction';

Now we are just setting up information to pass to admin-ajax.php. By default we pass a nonce that has been localized to our script. options.data.action should contain the action slug registered within WordPress using the wp_ajax_ hook. We will force our request to be an HTTP POST so we send our method along inside action_type for later use.

var json = this.toJSON();
var formattedJSON = {};

if ( json instanceof Array ) {
	formattedJSON.models = json;
} else {
	formattedJSON.model = json;
}
_.extend( options.data, formattedJSON );

This code sets up our model or collection data to be passed to the server. If the toJSON() representation of the current object is an Array, we know we have a collection. We extend the options.data object with the formattedJSON object we create.

options.emulateJSON = true;

This sends our data as application/x-www-form-urlencoded (classic form style) instead of application/json preventing us from having to decode JSON in our endpoint.

return Backbone.sync.call( this, 'create', object, options );

Finally, we call Backbone.sync in the current object context. We pass create as the method (forcing a POST request). object is simply passed along. We pass options having extended it with our own data. Essentially, our sync function is an intermediary between Backbone save/fetch and Backbone.sync.

var BaseModel = Backbone.Model.extend( _.defaults( {
	idAttribute: 'ID',
	parse: function( response ) {
		// Implement me depending on your response from admin-ajax.php!

		return response
	}
}, adminAjaxSyncableMixin ));

var BaseCollection = Backbone.Collection.extend( _.defaults( {
	parse: function( response ) {
		// Implement me depending on your response from admin-ajax.php!

		return response
	}
}, adminAjaxSyncableMixin ));

We define BaseModel by extending Backbone.Model and mixing in adminAjaxSyncableMixin. _.defaults returns an object filling in undefined properties of the first parameter object with corresponding property in the second parameter object. We define BaseCollection the same way extending Backbone.Collection and mixing in adminAjaxSyncableMixin.

Backbone.Model.parse and Backbone.Collection.parse intercept sync responses before they are processed into models and model data. Depending on how you write your admin-ajax.php endpoints, you may need to write some parsing code.

Finally, we can define, instantiate, and utilize new models and collections based on BaseModel and BaseCollection:

var myModel = BaseModel.extend({});
var myCollection = BaseCollection.extend({});

var modelInstance = new myModel( { ID: 1 } );
modelInstance.fetch();
modelInstance.set( 'key', 'value' );
modelInstance.save();

var collectionInstance = new myCollection();
collectionInstance.fetch();
collectionInstance.at( 0 ).set( 'key', 'value' );
collectionInstance.save();
Presentations, WordPress Core

JSON REST API for WordPress at the DC API User Group

Today I am presenting on the JSON REST API for WordPress at the DC API User Group. This is a shorter talk geared at both developers and API users with or without WordPress experience.

With core integration coming in the near future, it’s important for developers of ALL backgrounds to understand it will be available on ~23% of the websites on the internet. Here are the slides for my talk:

WordPress Core, WordPress Themes, WP API

A WordPress Starter Theme Based on JSON REST API’s Backbone Client

The past few months I’ve had the opportunity to work on the new JSON REST API for WordPres. My biggest contribution as a WP API team member has been the Backbone client.

The JSON REST API’s Backbone client let’s you interact with a WordPress installation using Backbone.js collections and models. The client is an extremely useful tool in creating reactive web applications (which seems to be where the web is heading).

As a proof of concept, I created a WordPress starter theme based on Automattic’s _s named _s_backbone. Loops (or post streams) in _s_backbone are driven by Backbone.js collections. This means that posts are grabbed on the fly without a page reload. Pagination is accomplished through a “more” button which, again, does not require a page reload. This is commonly referred to as “infinite scroll”.

Please download _s_backbone from Github. Any feedback is appreciated.

Edit: Check out the WP Tavern article on _s_backbone.

Node Module for Detecting Modified Core WordPress Files

Modifying core WordPress files is a quick way to introduce security vulnerabilities, break your site, and break future updates. From the perspective of a developer, nothing is worse than searching through a theme and plugins for a problem only to discover a previous developer modified a core WordPress file an introduced a bug.

I wrote a simple Node module that lets you easily check WordPress installations for modified or removed core files. Install the detect-wp-core-modifications npm package with the following shell command:

npm install -g detect-wp-core-modifications

You can run the command without any arguments from within the root of a WordPress installation with the following shell
command:

detect-wp-core-modifications

You can also specify a relative or absolute path to a WordPress installation:

detect-wp-core-modifications ../wordpress