我想从数据库中过滤掉重复的客户名称.一个客户可能有多个同名但拼写差异很小的系统条目.所以这是一个例子:一个名为 Brook 的客户可能有三个系统条目有了这种变化:
I want to filter out duplicate customer names from a database. A single customer may have more than one entry to the system with the same name but with little difference in spelling. So here is an example: A customer named Brook may have three entries to the system with this variations:
假设我们将此名称放在一个数据库列中.我想知道识别这种重复形式的不同机制,比如 100,000 条记录.我们可以在 C# 中使用正则表达式来遍历所有记录或其他一些模式匹配技术,或者我们可以将这些记录导出为最适合此类查询的内容(具有正则表达式功能的 SQL)).
Let's assume we are putting this name in one database column. I would like to know the different mechanisms to identify such duplications form say a 100,000 records. We may use regular expressions in C# to iterate through all records or some other pattern matching technique or we may export these records to what ever best fits for such queries (SQL with Regular Expression capabilities)).
这就是我认为的解决方案
This is what I thought as a solution
所以请提出任何想法.
Double Metaphone 算法于 2000 年发布,是 Soundex 算法的新改进版本,于 1918 年获得专利.
The Double Metaphone algorithm, published in 2000, is a new and improved version of the Soundex algorithm that was patented in 1918.
这篇文章提供了多种语言双元音实现的链接.
The article has links to Double Metaphone implementations in many languages.
这篇关于如何识别拼写不同的相似词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!