Using Solr to create scalable and dynamic Drupal Commerce facets

Oct 13, 2015

We were recently tasked with building search capabilities for a Drupal Commerce based website that would allow faceting and full-text searching for products whose attribute data was stored in a 3rd party ERP system. In a basic eCommerce implementation this is a relatively simple task using a standard Drupal site with the Search API module. It requires only a few clicks to create a highly customized search experience using blocks.

However, in this case this company had hundreds of thousands of independent products in Drupal, each with around 20 varying attributes that needed to be facetable. Storing all this data in Drupal would require 1,000 individual facet groups that needed to be created- that’s a lot of fields and a lot of blocks! It was impractical to import all this data into Drupal and manage it with the FieldAPI. What would happen if the client wanted to add a new facet, or double the number of products?

Fortunately, the client maintained an internal highly normalized SQL database that allowed us to generate these attributes via complex SQL queries. My jaw dropped from time to time when they would provide us with a page-long query that would execute in milliseconds. Although MySQL has some powerful capabilities and was able to give us this data relatively quickly, we knew it wouldn’t be as fast when performing full text or other complex searches. We had to find a way to condense this information and create efficient, searchable indexes that wouldn’t overload the server.

Solr to the rescue!

After a few brainstorming sessions we decided to implement a solution using Solr and a bit of Drupal magic. Solr is an open source software developed by the Apache Software Foundation that is designed to be a super powerful search engine. Solr works by indexing groups of fields (or “documents”) that represent a piece of content such as a product. These indexes can be standard indexes like those found in MySQL, or they can be an advanced text analysis allowing for Google-like searches. Additionally, Solr comes with many additional features used for scaling such as fault tolerance, distributed indexing, automated failover, load-balanced querying, and replication. Solr was the perfect tool to accomplish this feat as it allows for faceting and integrates with Drupal’s Search API module.

 More information on Solr can be found on their website: http://lucene.apache.org/solr/

Search API

Drupal’s Search API module already has great integrations with powerful search engines like Solr. The Drupal community has also produced additional modules that implement everything from facet support to autocompletion.
Since Solr is loosely schema based you need to create a Solr collection (like a database) using default configuration provided by the Search API Solr module.

This configuration allows you to flatten a Drupal database into entity based documents.  For example, one document might store the data of a certain node while another stores the data of a product entity. With our original problem of manageability in mind, we decided  to only import the basic information about a product into Drupal such as the SKU, category, and price. The rest of the data is stored in a second Solr collection that is managed and queried manually.

Creating and importing a custom Solr collection

To get all the metadata from our customer’s database into a Solr collection we first had to architect a schema. Solr allows for dynamic fields which is a great option for allowing thousands of different product attribute types. We decided to prefix these fields with “attr_” and append the title of the attribute to create the Solr field name. We used the title instead of an ID because one of the requirements was that the unique id was not a consistent identifier and could change at any time.  Thus an attribute called “Size of product” would be stored in a document as “attr_Size of product”. This ensured the fields would be consistent across all documents and could be pulled into Drupal easily. We also added another “sku” field to note what product the set of attributes corresponded to. This method allowed us to have an infinite number of attributes on a product and an infinite number of products.

My schema looked similar to the following:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="erp_core-4.2-solr-4.x" version="1.3">
 <types>
  <!-- types -->
</types>
<fields>
   <field name="item_no" type="text_general" indexed="true" stored="true"/>
   <dynamicField name="attr_*" type="text_general" indexed="true" stored="true" />
</fields>
</schema><?xml version="1.0" encoding="UTF-8" ?>
<schema name="erp_core-4.2-solr-4.x" version="1.3">
 <types>
  <!-- types -->
</types>
<fields>
   <field name="item_no" type="text_general" indexed="true" stored="true"/>
   <dynamicField name="attr_*" type="text_general" indexed="true" stored="true" />
</fields>
</schema>

To get all of the attributes properly imported we made use of nested entity queries and the multi-page queries the client originally sent us. We also used Solr post-processing to correctly set the field key using a transformer. For example:

<dataConfig>
<dataSource … />
<document>
  <entity name="items" query="SELECT * from items” >
    <field column="item_no" name="item_no" />
    <entity name="attr_series" query="Call item_details('${items.item_no}', '${items.mfg_code}', true);" transformer="script:AttrTranslation">
        </entity>

 </entity>
</document>
<script>
<![CDATA[
function AttrTranslation(row){
  var n = row.get('attr_name');
  var v = row.get('attr_value');
  row.clear();
  row.put('attr_’+ n, v );
  return row;
}
]]>
</script>
</dataConfig>

Now all we had to do was get the attributes into Drupal facets.

Creating Drupal facets from a custom Solr collection

Finally back to the world of Drupal… almost done! There are three parts to building the Drupal side of things. First, we need a way to get a list of facets. Second, we need a way to display the generated facets. Finally, we need to make sure the facets that are selected properly limit the results.

To generate facets on our search page we created a block that would only appear on pages where Solr searches happened. The block definition used a visibility callback that checked a static flag we set during the main Drupal Solr query. To generate the content we manually made a Solr request to our custom collection. This query made use of a ‘Solr join’ statement to limit results to items that existed in both our custom collection and the Drupal one. After creating a connection to our custom collection using entity_load and getSolrConnection, we generated the query with the following code:

 $result = $connection->makeServletRequest('select', array(
            'q' => '{!join from=ss_title to=item_no_lower fromIndex='.$drupal_core.'}' . $drupalQuery,
            'facet.field' => $facets,
            'facet' => "true",
            'fq' => $fq
        ));

In this query, $fq or the facet query is generated using standard GET parameters, similar to how Search API would use them. In a normal Solr query this is just the end of the query where you have key:value search pairs. The variable $facets is an array of attribute titles prefixed with “attr_” (as noted before). We generated this list by directly querying the client’s database and getting attributes associated with whatever category the user is on. This returns a list of facets that can be dynamically generated through HTML that would display in the block. It could also return all 1000 if we wanted that many facets. Since Solr queries a MySQL database the client can control what facets are used or even weight of the facet through a custom table.

To make sure the selected facets properly limit the Search API result we modified the Solr query created by Drupal and added a join statement. To achieve this, we extended the SearchApiSolrService class defined by the Search Api Solr module and overrode the preQuery function. We then used a hook to tell Drupal to load our version of the class instead of the default one.

Within the preQuery method we appended a Solr join statement to apply custom facet selection to the query. We also mirrored some of the attributes passed in from the original Drupal query (such as the category or text search) to further limit results from the ERP Solr Collection.

Our preQuery function was similar to the following:

 protected function preQuery(array &$call_args, SearchApiQueryInterface $query){
      //core name is stored in a variable
        $core2 = variable_get(‘solr_join_core2');
        foreach($call_args['params']['fq'] as $fq){
    //base statement
                $joinStatement='{!join from=item_no_lower to=ss_title fromIndex='.$core2.'}';
                //add in filters
                if(isset($_GET['f_i'])){
                    foreach($_GET['f_i'] as $filter=>$val){                       $joinStatement.=$this->betterSolrEscape($filter).':'.$this->betterSolrEscape($val).' ';
                    }
                }else{
                    $joinStatement.='*:*';
                }

                $query = &drupal_static('module_solr_drupal_query'); //store drupal side of query
                $query = implode(' AND ',$call_args['params']['fq']);

                $call_args['params']['fq'][] = $joinStatement; //add join

                return;
            }
    }

You may notice in the above code we store the original parameters in a static variable. This is a clever little hack we used to transfer the parameters to the block without using globals as mentioned before.

Powering your search with Solr

Scaling product facets proved to be a very interesting challenge. A lot can already be accomplished using the community-provided SearchAPI module, but there are still some gaps when dealing with scaling the number of fields into the thousands or even millions. Fortunately, Drupal’s flexibility allows us to step outside the box to identify creative solutions that make use of the unique tools available in a given client situation. I hope this post provides you with a framework for using Drupal methodologies to extend the capabilities of SearchAPI, allowing us to access information that isn’t plausible to store in a standard Drupal installation. The power of Drupal, SearchApi, Solr, and a little creative thinking allowed us to solve a common scalability problem.

By submitting this form, you accept the Mollom privacy policy.