Problem:
Demo the use of slop and fuzziness in Elasticsearch to overcome missing terms, proximity searches or spelling mistakes.
Solution Summary:
The slop parameter represents how far a term may be moved (in any direction) to satisfy a phrase. The slop parameter helps in proximity searches.
Prerequisites:
Setup accounts index as explained here.
Solution Steps:
Missing Terms (Without Slop)
Assume we need to search for address "246 Beverly Road", but forgot "Beverly":
GET /accounts/_search
{
"query": {
"match_phrase": {
"address": "246 Road"
}
}
}
Note: This will not return any result.
Missing Terms (With Slop)
Assume we need to search for address "246 Beverly Road", but forgot "Beverly":
GET /accounts/_search
{
"query": {
"match_phrase": {
"address": {
"query": "246 Road",
"slop" : 1
}
}
}
}
Note:
-
This will return the record with address "246 Beverly Road" as we are telling we are ok to move one word for getting a match.
-
You may specify a very high slop value (e.g. 100) for proximity searches ( return documents with both words, but words which are closer have higher relevance).
Spelling mistake (without fuzziness)
Assume we are searching for "Hamilton" and we typed "Hamiltn" instead with a missing o:
GET /accounts/_search
{
"query": {
"match": {
"city":"Hamiltn"
}
}
}
Note: This will not return any value.
Spelling mistake (with fuzziness)
Assume we are searching for "Hamilton" and we typed "Hamiltn" instead with a missing o:
GET /accounts/_search
{
"query": {
"fuzzy": {
"city":{
"value": "Hamiltn", "fuzziness": "2"
}
}
}
}
Note:
-
This will return record with city "Hamilton".
-
We use "fuzziness" as 2, because "Hamilton" is stored as "hamilton" in the inverted index. One change for capitalizing H and one for missing o.
-
You may also use "fuzziness": "AUTO" which means:
-
0 corrections for 1-2 character strings.
-
1 correction for 3-5 character strings.
-
2 corrections for strings with length more than 5.
-
-
The fuzziness within Elasticsearch follows the levelshtein edit distance.
TODO
- Add example to demo proximity search relevance.
Recent comments