martes, 23 de mayo de 2017

Emparejamiento difuso - Fuzzy Matching Algorithms To the Rescue

Levenshtein is a great algorithm to detect typos in a search query. It operates based on how distant one search term is from another term. Starting with the “source” word, it counts the number of operations (additions, subtractions, substitutions) it takes to arrive at the “destination” word. This make the Levenshtein algorithm particularly good at catching seach typos, and uncommon spellings.
(https://mrfrosti.com/2011/08/22/fuzzy-matching-in-postgresql-with-nicknames/)

Previamente habilitar la extensión:

CREATE EXTENSION fuzzystrmatch;

Ejemplo:

SELECT levenshtein('iden', 'idem');

Ejemplo: comparando el campo NOMBRE de dos tablas:

SELECT a.nombre1, b.nombre2, levenshtein(a.nombre1, b.nombre2) as leven
from tabla1 a, tabla2 b
where 
and levenshtein(a.nombre_com, b.nombre_c_1) >=  0
order by leven

links relacionados:

https://www.postgresql.org/docs/9.3/static/functions-matching.html
https://www.laurivan.com/fuzzy-string-matching-in-postgresql/
https://www.rdegges.com/2013/easy-fuzzy-text-searching-with-postgresql/

No hay comentarios.:

Publicar un comentario