Find duplicate contacts in linear time (that is, not having to compare all records to all other records)
I’ve implemented an experimental approach using the simhash algorithm. It works OK, but still needs performance enhancements before we can put it in production.
A question for our users: how important is this to you? Its quite a lot of work, so we’ve not prioritised it yet.
Also, how do you anticipate using it? A regular monthly cleanup of records flagged? Or something else?