我已经研究了一段时间,但找不到从 SQL Server 中的逗号分隔字符串和管道分隔字符串中删除重复字符串的方法.
I have been looking into this for a while now and I cannot find a way to remove duplicate strings from a comma-separated as well as pipeline seperated string in SQL Server.
给定字符串
test1,test2,test1|test2,test3|test4,test4|test4
有谁知道你会如何返回 test1,test2,test3,test4?
does anyone know how would you return test1,test2,test3,test4?
方法
以下方法可用于对分隔的值列表进行重复数据删除.
The following approach can be used to de-duplicate a delimited list of values.
REPLACE() 函数将不同的分隔符转换为相同的分隔符.REPLACE() 函数注入 XML 结束和开始标记以创建 XML 片段CAST(expr AS XML)函数将上述片段转换为XML数据类型OUTER APPLY 应用表值函数 nodes() 将 XML 片段拆分为其组成的 XML 标记.这将在单独的行中返回每个 XML 标记.value() 函数仅从 XML 标记中提取值,并使用指定的数据类型返回值.DISTINCT 关键字的使用现在可以删除重复的行(即值).FOR XML PATH('') 子句将多行中的值连接成一行.REPLACE() function to convert different delimiters into the same delimiter. REPLACE() function to inject XML closing and opening tags to create an XML fragment CAST(expr AS XML) function to convert the above fragment into the XML data type OUTER APPLY to apply the table-valued function nodes() to split the XML fragment into its constituent XML tags. This returns each XML tag on a separate row. value() function, and returns the value using the specified data type. DISTINCT keyword now removes duplicate rows (i.e. values). FOR XML PATH('') clause to concatenate the values across multiple rows into a single row. 查询
将上述方法放在查询表单中:
Putting the above approach in query form:
SELECT DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)') + ','
FROM (
-- This query returns the following in theDataXml column:
-- <tag>test1</tag><tag>test2</tag><tag>test1</tag><tag>test2</tag><tag>test3</tag><tag>test4</tag><tag>test4</tag><tag>test4</tag>
-- i.e. it has turned the original delimited data into an XML fragment
SELECT
DataTable.DataColumn AS DataRaw
, CAST(
'<tag>'
-- First replace commas with pipes to have only a single delimiter
-- Then replace the pipe delimiters with a closing and opening tag
+ replace(replace(DataTable.DataColumn, ',','|'), '|','</tag><tag>')
-- Add a final set of closing tags
+ '</tag>'
AS XML) AS DataXml
FROM ( SELECT 'test1,test2,test1|test2,test3|test4,test4|test4' AS DataColumn) AS DataTable
) AS x
OUTER APPLY DataXml.nodes('tag') AS PivotedTable(PivotedColumn)
-- Running the query without the following line will return the data in separate rows
-- Running the query with the following line returns the rows concatenated, i.e. it returns:
-- test1,test2,test3,test4,
FOR XML PATH('')
输入&结果
给定输入:
test1,test2,test1|test2,test3|test4,test4|test4
test1,test2,test1|test2,test3|test4,test4|test4
上面的查询会返回结果:
The above query will return the result:
测试1,测试2,测试3,测试4,
test1,test2,test3,test4,
注意末尾的尾随逗号.我将把它作为练习留给你来删除它.
Notice the trailing comma at the end. I'll leave it as an exercise to you to remove that.
重复次数
OP 在评论中请求我如何获得重复的计数?在单独的列中".
OP requested in a comment "how do i get t5he count of duplicates as well? in a seperate column".
最简单的方法是使用上述查询,但删除最后一行 FOR XML PATH('').然后,计算上述查询中 SELECT 表达式返回的所有值和不同值(即 PivotedTable.PivotedColumn.value('.','nvarchar(max)')).所有值的计数与不同值的计数之间的差值是重复值的计数.
The simplest way would be to use the above query but remove the last line FOR XML PATH(''). Then, counting all values and distinct values returned by the SELECT expression in the above query (i.e. PivotedTable.PivotedColumn.value('.','nvarchar(max)')). The difference between the count of all values and the count of distinct values is the count of duplicate values.
SELECT
COUNT(PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfAllValues
, COUNT(DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfUniqueValues
-- The difference of the previous two counts is the number of duplicate values
, COUNT(PivotedTable.PivotedColumn.value('.','nvarchar(max)'))
- COUNT(DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfDuplicateValues
FROM (
-- This query returns the following in theDataXml column:
-- <tag>test1</tag><tag>test2</tag><tag>test1</tag><tag>test2</tag><tag>test3</tag><tag>test4</tag><tag>test4</tag><tag>test4</tag>
-- i.e. it has turned the original delimited data into an XML fragment
SELECT
DataTable.DataColumn AS DataRaw
, CAST(
'<tag>'
-- First replace commas with pipes to have only a single delimiter
-- Then replace the pipe delimiters with a closing and opening tag
+ replace(replace(DataTable.DataColumn, ',','|'), '|','</tag><tag>')
-- Add a final set of closing tags
+ '</tag>'
AS XML) AS DataXml
FROM ( SELECT 'test1,test2,test1|test2,test3|test4,test4|test4' AS DataColumn) AS DataTable
) AS x
OUTER APPLY DataXml.nodes('tag') AS PivotedTable(PivotedColumn)
对于上面显示的相同输入,此查询的输出是:
For the same input shown above, the output of this query is:
CountOfAllValues CountOfUniqueValues CountOfDuplicateValues
---------------- ------------------- ----------------------
8 4 4
这篇关于从逗号或管道运算符字符串中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
修改现有小数位信息Modify Existing decimal places info(修改现有小数位信息)
多次指定相关名称“CONVERT"The correlation name #39;CONVERT#39; is specified multiple times(多次指定相关名称“CONVERT)
T-SQL 左连接不返回空列T-SQL left join not returning null columns(T-SQL 左连接不返回空列)
将迭代查询更改为基于关系集的查询Change an iterative query to a relational set-based query(将迭代查询更改为基于关系集的查询)
将零连接到 sql server 选择值仍然显示 4 位而不是concatenate a zero onto sql server select value shows 4 digits still and not 5(将零连接到 sql server 选择值仍然显示 4 位而不是 5)
SQL 显示两个日期之间的月份和年份SQL to display both month and year between two dates(SQL 显示两个日期之间的月份和年份)