Recently during one of our projects there was a need to perform a search on a content item based on a “Keywords” field. This field was of type single line text and it had contained a list of keywords separated by commas. The keywords could be a single word or a phrase. The search query had to match exactly with one of the keywords for the content item to be retuned in the search results. For example, consider the following field which contains the following keywords:
- apple
- orange
- banana pineapple
So, the content item should be returned as a result only if the search query contains “apple”, “orange” and “banana pineapple”
The first approach we looked at while trying to solve the problem was to use the “.Contains()” Linq property in the query
.Where(x => x.Keywords.Contains(searchQuery))
This approach did not work if the search query contained terms such as “range” or “nana”, as match would be made to the field and the content item would be added to the list of search results and this is not what is required.
The next approach that came to mind was to use the “.Equals()” property so that words would be matched exactly. So “nana” and “range” would no longer result in a hit. However, this had the problem that hits were made when the search query contained words such as “banana” or “pineapple” or “apple orange”. This is because Lucene tokenizes each word in the field regardless of the commas. So, banana and pineapple would be tokenized as two separate words and a hit would be made in the index if search query contains either of the two words. Similarly, when the user types “apple orange” finds a match because it does not consider the comma in between.
Since the two approaches did not work we built a custom index that wouldn’t tokenize the words but would store each keyword as a single term. The custom index would also store the keywords as a list of strings so only exact matches in search query to one of the strings in the list would result in a hit.
For the implementation of the custom index the following changes will have to be made in the solution
First, add the new index field in your configuration and mention the location within your project where the index computation will be handled.
<fields hint="raw:AddComputedIndexField">
<field fieldName="_searchkeywords">YourProject.Custom.Index.ComputedFields.ExactKeywordsField, YourProject</field>
</fields>
The index also needs to be specified in the fieldMap where you can specify the properties of the index such as type=”UNTOKENIZED”, StorageType, data type and so on
<fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
<field fieldName="_searchkeywords" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
<Analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
</field>
</fieldNames>
</fieldMap>
The new index is computed by splitting the field by commas and converting it to a list of strings:
return item.Fields["Keywords"].ToString().Split(',').ToList().Select(x => x.Trim().ToLower());
Upon rebuilding the index, we notice the index now stores the keywords as 3 separate entries as shown below:
Once we see that the keywords are stored as separate entries, we can modify the Linq query by using .Equals() property to get an exact match on one of the search keywords.
.Where(x => x.ExactKeywords.Equals(search.ToLower()))
Now if a user searches for one of the three keywords, he should get an exact match on the content item.