Discussion about this post

User's avatar
Joe Corliss's avatar

When I encountered fuzzy de-duplication in the past, it was usually sufficient to normalize the data. For phone numbers, you can remove all non-digit characters, and you can normalize addresses using a service like Smarty. Then do exact comparison. However, normalization might not be good enough for all types of data.

Panna Lal Patodia's avatar

The method you provided is far from perfect. There are significantly better methods available for checking fuzzy matching with a specified number of errors. You can search for NMSLIB. We have developed our own method that surpasses NMSLIB, providing faster results and matching more strings. - Panna Lal Patodia, CEO, Patodia Infotech Private Limited

4 more comments...

No posts

Ready for more?