<tfoot id='PGBlV'></tfoot>
    <bdo id='PGBlV'></bdo><ul id='PGBlV'></ul>

  • <legend id='PGBlV'><style id='PGBlV'><dir id='PGBlV'><q id='PGBlV'></q></dir></style></legend>

    1. <small id='PGBlV'></small><noframes id='PGBlV'>

      <i id='PGBlV'><tr id='PGBlV'><dt id='PGBlV'><q id='PGBlV'><span id='PGBlV'><b id='PGBlV'><form id='PGBlV'><ins id='PGBlV'></ins><ul id='PGBlV'></ul><sub id='PGBlV'></sub></form><legend id='PGBlV'></legend><bdo id='PGBlV'><pre id='PGBlV'><center id='PGBlV'></center></pre></bdo></b><th id='PGBlV'></th></span></q></dt></tr></i><div id='PGBlV'><tfoot id='PGBlV'></tfoot><dl id='PGBlV'><fieldset id='PGBlV'></fieldset></dl></div>

        SQL 函数 - 使用 Levenshtein 距离算法进行模糊匹配

        时间:2023-06-06

        1. <tfoot id='LF3x5'></tfoot>
            <tbody id='LF3x5'></tbody>

            <small id='LF3x5'></small><noframes id='LF3x5'>

              <bdo id='LF3x5'></bdo><ul id='LF3x5'></ul>

              1. <legend id='LF3x5'><style id='LF3x5'><dir id='LF3x5'><q id='LF3x5'></q></dir></style></legend>
                  <i id='LF3x5'><tr id='LF3x5'><dt id='LF3x5'><q id='LF3x5'><span id='LF3x5'><b id='LF3x5'><form id='LF3x5'><ins id='LF3x5'></ins><ul id='LF3x5'></ul><sub id='LF3x5'></sub></form><legend id='LF3x5'></legend><bdo id='LF3x5'><pre id='LF3x5'><center id='LF3x5'></center></pre></bdo></b><th id='LF3x5'></th></span></q></dt></tr></i><div id='LF3x5'><tfoot id='LF3x5'></tfoot><dl id='LF3x5'><fieldset id='LF3x5'></fieldset></dl></div>
                  本文介绍了SQL 函数 - 使用 Levenshtein 距离算法进行模糊匹配 - 仅返回最低值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  限时送ChatGPT账号..

                  问题:需要 SQL 函数使用 Levenshtein 算法返回最低"匹配值.

                  代码:

                  <预><代码>创建函数 ufn_levenshtein(@s1 nvarchar(3999), @s2 nvarchar(3999))返回整数作为开始声明 @s1_len int, @s2_len int声明 @i int、@j int、@s1_char nchar、@c int、@c_temp int声明@cv0 varbinary(8000), @cv1 varbinary(8000)选择@s1_len = LEN(@s1),@s2_len = LEN(@s2),@cv1 = 0x0000,@j = 1,@i = 1,@c = 0而@j <= @s2_lenSELECT @cv1 = @cv1 + CAST(@j AS binary(2)),@j = @j + 1当@i <= @s1_len开始选择@s1_char = SUBSTRING(@s1, @i, 1),@c = @i,@cv0 = CAST(@i AS binary(2)),@j = 1当@j <= @s2_len开始设置@c = @c + 1SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j-1, 2) AS int) +当@s1_char = SUBSTRING(@s2, @j, 1) THEN 0 ELSE 1 END如果@c >@c_temp SET @c = @c_tempSET @c_temp = CAST(SUBSTRING(@cv1, @j+@j+1, 2) AS int)+1如果@c >@c_temp SET @c = @c_tempSELECT @cv0 = @cv0 + CAST(@c AS binary(2)),@j = @j + 1结尾选择@cv1 = @cv0,@i = @i + 1结尾返回@c结尾如果 OBJECT_ID('tempdb..#ExistingCustomers') 不是 NULL删除表#ExistingCustomers;创建表#ExistingCustomers(客户 VARCHAR(255),身份证号码)INSERT #ExistingCustomers SELECT 'Ed''s Barbershop', 1002INSERT #ExistingCustomers SELECT 'GroceryTown', 1003INSERT #ExistingCustomers SELECT 'Candy Place', 1004INSERT #ExistingCustomers SELECT 'Handy Man', 1005如果 OBJECT_ID('tempdb..#POTENTIALCUSTOMERS') 不是 NULL下降表#潜在客户;创建表#POTENTIALCUSTOMERS(客户VARCHAR(255));插入 #POTENTIALCUSTOMERS SELECT 'Eds Barbershop'INSERT #POTENTIALCUSTOMERS SELECT '杂货城'插入 #POTENTIALCUSTOMERS 选择糖果店"INSERT #POTENTIALCUSTOMERS SELECT 'Handee Man'插入 #POTENTIALCUSTOMERS 选择苹果农场"插入 #POTENTIALCUSTOMERS SELECT 'Ride-a-Long Bikes'选择 A. 客户,出价,b.客户作为客户,dbo.ufn_levenshtein(REPLACE(A.Customer, ' ', ''), REPLACE(B.Customer, ' ', '')) 作为 ValueLev来自#POTENTIALCUSTOMERS aLEFT JOIN #ExistingCustomers b ON dbo.ufn_levenshtein(REPLACE(A.Customer, ' ', ''), REPLACE(B.Customer, ' ', '')) <15;

                  返回:

                  我想退货:

                  说明:结果是 Levenshtein 算法的最低"值.有两行 Levenshtein 分数相同 The Apple FarmRide-a-Long Bikes,在这种情况下,任何值都可以,只要它是一种价值.

                  参考资料:

                  SQL 模糊连接 - MSSQL

                  http://www.kodyaz.com/articles/fuzzy-string-matching-using-levenshtein-distance-sql-server.aspx

                  解决方案

                  如果您按潜在客户进行分区并使用 ValueLev 对结果进行排序,则可以使用 CTE 来获得您想要的结果:

                  ;with CTE AS(SELECT RANK() OVER (PARTITION BY a.Customer ORDER BY dbo.ufn_levenshtein(REPLACE(A.Customer, ' ', ''), REPLACE(B.Customer, ' ', '')) ASC) AS RowNbr,一个客户,出价,b.客户作为客户,dbo.ufn_levenshtein(REPLACE(A.Customer, ' ', ''), REPLACE(B.Customer, ' ', '')) 作为 ValueLev来自#POTENTIALCUSTOMERS aLEFT JOIN #ExistingCustomers b ON dbo.ufn_levenshtein(REPLACE(A.Customer, ' ', ''), REPLACE(B.Customer, ' ', '')) <15)选择客户,MIN(ID) 作为 ID,MIN(cust) AS cust,价值等级从 CTE哪里 CTE.RowNbr = 1按客户分组,ValueLev

                  由于您不介意在重复 ValueLev 的情况下返回哪个结果,请使用 GROUP BYMIN 来缩放结果每个潜在客户最多一个.

                  输出:

                  Customer ID cust ValueLev糖果店 1004 糖果店 0杂货镇 1003 杂货镇 0Eds 理发店 1002 Ed 的理发店 1勤杂工 1005 勤杂工 2苹果农场 1004 Candy Place 9Ride-a-Long Bikes 1003 Candy Place 14

                  Problem: Need SQL function to return the 'lowest' matching value using the Levenshtein algorithm.

                  Code:

                  
                  CREATE FUNCTION ufn_levenshtein(@s1 nvarchar(3999), @s2 nvarchar(3999))
                  RETURNS int
                  AS
                  BEGIN
                   DECLARE @s1_len int, @s2_len int
                   DECLARE @i int, @j int, @s1_char nchar, @c int, @c_temp int
                   DECLARE @cv0 varbinary(8000), @cv1 varbinary(8000)
                  
                   SELECT
                    @s1_len = LEN(@s1),
                    @s2_len = LEN(@s2),
                    @cv1 = 0x0000,
                    @j = 1, @i = 1, @c = 0
                  
                   WHILE @j <= @s2_len
                    SELECT @cv1 = @cv1 + CAST(@j AS binary(2)), @j = @j + 1
                  
                   WHILE @i <= @s1_len
                   BEGIN
                    SELECT
                     @s1_char = SUBSTRING(@s1, @i, 1),
                     @c = @i,
                     @cv0 = CAST(@i AS binary(2)),
                     @j = 1
                  
                    WHILE @j <= @s2_len
                    BEGIN
                     SET @c = @c + 1
                     SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j-1, 2) AS int) +
                      CASE WHEN @s1_char = SUBSTRING(@s2, @j, 1) THEN 0 ELSE 1 END
                     IF @c > @c_temp SET @c = @c_temp
                     SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j+1, 2) AS int)+1
                     IF @c > @c_temp SET @c = @c_temp
                     SELECT @cv0 = @cv0 + CAST(@c AS binary(2)), @j = @j + 1
                   END
                  
                   SELECT @cv1 = @cv0, @i = @i + 1
                   END
                  
                   RETURN @c
                  END
                  
                  
                  
                  
                  IF OBJECT_ID('tempdb..#ExistingCustomers') IS NOT NULL
                      DROP TABLE #ExistingCustomers;
                  
                      CREATE TABLE #ExistingCustomers
                  (
                      Customer VARCHAR(255),
                      ID INT
                  )
                  
                  INSERT #ExistingCustomers SELECT 'Ed''s Barbershop',  1002
                  INSERT #ExistingCustomers SELECT 'GroceryTown',  1003
                  INSERT #ExistingCustomers SELECT 'Candy Place',  1004
                  INSERT #ExistingCustomers SELECT 'Handy Man',  1005
                  
                  
                  
                  IF OBJECT_ID('tempdb..#POTENTIALCUSTOMERS') IS NOT NULL
                      DROP TABLE #POTENTIALCUSTOMERS;
                  
                  CREATE TABLE #POTENTIALCUSTOMERS(Customer VARCHAR(255));
                  
                  INSERT #POTENTIALCUSTOMERS SELECT 'Eds Barbershop'
                  INSERT #POTENTIALCUSTOMERS SELECT 'Grocery Town'
                  INSERT #POTENTIALCUSTOMERS SELECT 'Candy Place'
                  INSERT #POTENTIALCUSTOMERS SELECT 'Handee Man'
                  INSERT #POTENTIALCUSTOMERS SELECT 'The Apple Farm'
                  INSERT #POTENTIALCUSTOMERS SELECT 'Ride-a-Long Bikes'
                  
                  
                  SELECT A.Customer,
                         b.ID,
                         b.Customer as cust,
                         dbo.ufn_levenshtein(REPLACE(A.Customer, ' ', ''), REPLACE(B.Customer, ' ', '')) as ValueLev
                  FROM #POTENTIALCUSTOMERS a
                       LEFT JOIN #ExistingCustomers b ON dbo.ufn_levenshtein(REPLACE(A.Customer, ' ', ''), REPLACE(B.Customer, ' ', '')) < 15;
                  
                  

                  This returns:

                  What I would like to return:

                  Explanation: The results are the 'lowest' values from the Levenshtein algorithm. There are two rows where the Levenshtein scores are the same The Apple Farm and Ride-a-Long Bikes, in which case any of the values is fine, just as long as it is one value.

                  References:

                  SQL Fuzzy Join - MSSQL

                  http://www.kodyaz.com/articles/fuzzy-string-matching-using-levenshtein-distance-sql-server.aspx

                  解决方案

                  You can use CTE to get the result you want if you partition by the potential customer and use the ValueLev to order the results:

                  ;WITH CTE AS
                  (
                      SELECT  RANK() OVER (PARTITION BY a.Customer ORDER BY dbo.ufn_levenshtein(REPLACE(A.Customer, ' ', ''), REPLACE(B.Customer, ' ', '')) ASC) AS RowNbr,
                              A.Customer,
                              b.ID,
                              b.Customer as cust,
                              dbo.ufn_levenshtein(REPLACE(A.Customer, ' ', ''), REPLACE(B.Customer, ' ', '')) as ValueLev
                        FROM  #POTENTIALCUSTOMERS a
                          LEFT JOIN #ExistingCustomers b ON dbo.ufn_levenshtein(REPLACE(A.Customer, ' ', ''), REPLACE(B.Customer, ' ', '')) < 15
                  )
                  SELECT  Customer,
                          MIN(ID) AS ID,
                          MIN(cust) AS cust,
                          ValueLev
                    FROM  CTE
                    WHERE CTE.RowNbr = 1
                    GROUP BY Customer, ValueLev
                  

                  As you don't mind which result is returned in the case of duplicate ValueLev, use GROUP BY and MIN to scale the results down to one per potential customer.

                  Output:

                  Customer            ID      cust            ValueLev
                  Candy Place         1004    Candy Place     0
                  Grocery Town        1003    GroceryTown     0
                  Eds Barbershop      1002    Ed's Barbershop 1
                  Handee Man          1005    Handy Man       2
                  The Apple Farm      1004    Candy Place     9
                  Ride-a-Long Bikes   1003    Candy Place     14
                  

                  这篇关于SQL 函数 - 使用 Levenshtein 距离算法进行模糊匹配 - 仅返回最低值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:Apache Airflow - 使用 pymssql + SQLAlchemy 连接到 MS SQL 下一篇:嵌套计算列“无效的列名"错误(T-SQL 列别名

                  相关文章

                  最新文章

                • <tfoot id='J74jT'></tfoot>

                  1. <small id='J74jT'></small><noframes id='J74jT'>

                      <legend id='J74jT'><style id='J74jT'><dir id='J74jT'><q id='J74jT'></q></dir></style></legend>
                      <i id='J74jT'><tr id='J74jT'><dt id='J74jT'><q id='J74jT'><span id='J74jT'><b id='J74jT'><form id='J74jT'><ins id='J74jT'></ins><ul id='J74jT'></ul><sub id='J74jT'></sub></form><legend id='J74jT'></legend><bdo id='J74jT'><pre id='J74jT'><center id='J74jT'></center></pre></bdo></b><th id='J74jT'></th></span></q></dt></tr></i><div id='J74jT'><tfoot id='J74jT'></tfoot><dl id='J74jT'><fieldset id='J74jT'></fieldset></dl></div>
                      • <bdo id='J74jT'></bdo><ul id='J74jT'></ul>