Discussion about this post

User's avatar
Joe Corliss's avatar

When I encountered fuzzy de-duplication in the past, it was usually sufficient to normalize the data. For phone numbers, you can remove all non-digit characters, and you can normalize addresses using a service like Smarty. Then do exact comparison. However, normalization might not be good enough for all types of data.

Expand full comment
Panna Lal Patodia's avatar

The method you provided is far from perfect. There are significantly better methods available for checking fuzzy matching with a specified number of errors. You can search for NMSLIB. We have developed our own method that surpasses NMSLIB, providing faster results and matching more strings. - Panna Lal Patodia, CEO, Patodia Infotech Private Limited

Expand full comment
4 more comments...

No posts