[Recipes] Slop, Fuzziness and Proximity Searches in Elasticsearch

Problem: 

Demo the use of slop and fuzziness in Elasticsearch to overcome missing terms, proximity searches or spelling mistakes.

Solution Summary: 

The slop parameter represents how far a term may be moved (in any direction) to satisfy a phrase. The slop parameter helps in proximity searches.

Prerequisites: 

Setup accounts index as explained here.

Solution Steps: 

Missing Terms (Without Slop)

Assume we need to search for address "246 Beverly Road", but forgot "Beverly":

GET /accounts/_search
{
  "query": {
    "match_phrase": {
      "address": "246 Road"
    }
  }
}

Note: This will not return any result.

 

Missing Terms (With Slop)

Assume we need to search for address "246 Beverly Road", but forgot "Beverly":

GET /accounts/_search
{
  "query": {
    "match_phrase": {
      "address": {
        "query": "246 Road",
        "slop" : 1
      }
    }
  }
}

Note:

  1. This will return the record with address "246 Beverly Road" as we are telling we are ok to move one word for getting a match. 

  2. You may specify a very high slop value (e.g. 100) for proximity searches ( return documents with both words, but words which are closer have higher relevance).

 

Spelling mistake (without fuzziness)

Assume we are searching for "Hamilton" and we typed "Hamiltn" instead with a missing o:

GET /accounts/_search
{
  "query": {
    "match": {
      "city":"Hamiltn"
    }
  }

Note: This will not return any value.

 

Spelling mistake (with fuzziness)

Assume we are searching for "Hamilton" and we typed "Hamiltn" instead with a missing o:

GET /accounts/_search
{
  "query": {
    "fuzzy": {
      "city":{
        "value": "Hamiltn", "fuzziness": "2"
      }
    }
  }
}

Note:

  1. This will return record with city "Hamilton".

  2. We use "fuzziness" as 2, because "Hamilton" is stored as "hamilton" in the inverted index. One change for capitalizing H and one for missing o.

  3. You may also use "fuzziness": "AUTO" which means:

    • 0 corrections for 1-2 character strings.

    • 1 correction for 3-5 character strings.

    • 2 corrections for strings with length more than 5.

  4. The fuzziness within Elasticsearch follows the levelshtein edit distance

 

TODO

  1. Add example to demo proximity search relevance.

Recipe Tags: 

Learn Serverless from Serverless Programming Cookbook

Contact

Please first use the contact form or facebook page messaging to connect.

Offline Contact
We currently connect locally for discussions and sessions at Bangalore, India. Please follow us on our facebook page for details.
WhatsApp (Primary): (+91) 7411174113
Phone (Escalations): (+91) 7411174114

Business newsletter

Complete the form below, and we'll send you an e-mail every now and again with all the latest news.

About

CloudMaterials is my blog to share notes and learning materials on Cloud and Data Analytics. My current focus is on Microsoft Azure and Amazon Web Services (AWS).

I like to write and I try to document what I learn to share with others. I believe that knowledge is useless unless you share it; the more you share, the more you learn.

Recent comments

Photo Stream