如何识别拼写不同的相似词

时间：2023-02-17

本文介绍了如何识别拼写不同的相似词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从数据库中过滤掉重复的客户名称.一个客户可能有多个同名但拼写差异很小的系统条目.所以这是一个例子:一个名为 Brook 的客户可能有三个系统条目有了这种变化:

I want to filter out duplicate customer names from a database. A single customer may have more than one entry to the system with the same name but with little difference in spelling. So here is an example: A customer named Brook may have three entries to the system with this variations:

布鲁克·伯塔
布鲁克·伯塔
比鲁克·贝尔塔

假设我们将此名称放在一个数据库列中.我想知道识别这种重复形式的不同机制，比如 100,000 条记录.我们可以在 C# 中使用正则表达式来遍历所有记录或其他一些模式匹配技术，或者我们可以将这些记录导出为最适合此类查询的内容(具有正则表达式功能的 SQL)).

Let's assume we are putting this name in one database column. I would like to know the different mechanisms to identify such duplications form say a 100,000 records. We may use regular expressions in C# to iterate through all records or some other pattern matching technique or we may export these records to what ever best fits for such queries (SQL with Regular Expression capabilities)).

这就是我认为的解决方案

This is what I thought as a solution

编写一个 C# 代码来遍历每条记录
仅按顺序获取辅音字母(在上述情况下:BrKBrt)
从其他记录中搜索相同的辅音模式，考虑类似发音的字母如 (C,K) (C,S), (F, PH)

所以请提出任何想法.

如何识别拼写不同的相似词

问题描述

推荐答案

相关文章

最新文章