为什么 Char.IsDigit 对于无法解析为 int 的字符返回

时间:2023-04-01
本文介绍了为什么 Char.IsDigit 对于无法解析为 int 的字符返回 true?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常使用字符.IsDigit 来检查 char 是否是一个数字,这在 LINQ 查询中特别方便以预先检查 int.Parse 如下:"123".All(Char.IsDigit).

但是有些字符是数字,但不能像 ۵ 那样解析为 int.

//真bool isDigit = Char.IsDigit('۵');var文化 = CultureInfo.GetCultures(CultureTypes.SpecificCultures);整数;//错误的bool isIntForAnyCulture = 文化.Any(c => int.TryParse('۵'.ToString(), NumberStyles.Any, c, out num));

这是为什么?我的 int.Parse-通过 Char.IsDigit 进行预检查是否不正确?

有 310 个字符是数字:

ListdigitList = Enumerable.Range(0, UInt16.MaxValue).Select(i => Convert.ToChar(i)).Where(c => Char.IsDigit(c)).ToList();

以下是 .NET 4 (ILSpy) 中 Char.IsDigit 的实现:

public static bool IsDigit(char c){如果 (char.IsLatin1(c)){返回 c >= '0' &&c <= '9';}返回 CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber;}

那么为什么会有属于 DecimalDigitNumber-category("十进制数字字符,即 0 到 9 范围内的字符...")在任何文化中都不会被解析为 int 吗?

解决方案

这是因为它正在检查 Unicode数字,十进制数字"类别中的所有数字,如下所列:

http://www.fileformat.info/info/unicode/类别/Nd/list.htm

这并不意味着它是当前语言环境中的有效数字字符.事实上,使用int.Parse(),你只能解析正常的英文数字,​​而不管区域设置如何.

例如,这不起作用:

int test = int.Parse("٣", CultureInfo.GetCultureInfo("ar"));

即使 ٣ 是有效的阿拉伯数字字符,并且ar"是阿拉伯语区域设置标识符.

Microsoft 文章 如何:解析 Unicode 数字" 指出那个:

<块引用><块引用>

.NET Framework 解析为十进制的唯一 Unicode 数字是 ASCII 数字 0 到 9,由代码值 U+0030 到 U+0039 指定..NET Framework 将所有其他 Unicode 数字解析为字符.

但是,请注意,您可以使用 char.GetNumericValue() 将 unicode 数字字符转换为双精度数字.

返回值是 double 而不是 int 的原因是这样的:

Console.WriteLine(char.GetNumericValue('¼'));//打印 0.25

您可以使用类似的方法将字符串中的所有数字字符转换为它们的 ASCII 等价物:

public string ConvertNumericChars(string input){StringBuilder 输出 = new StringBuilder();foreach(输入中的字符ch){如果 (char.IsDigit(ch)){双值 = char.GetNumericValue(ch);if ((value >= 0) && (value <= 9) && (value == (int)value)){output.Append((char)('0'+(int)value));继续;}}output.Append(ch);}返回 output.ToString();}

I often use Char.IsDigit to check if a char is a digit which is especially handy in LINQ queries to pre-check int.Parse as here: "123".All(Char.IsDigit).

But there are chars which are digits but which can't be parsed to int like ۵.

// true
bool isDigit = Char.IsDigit('۵'); 

var cultures = CultureInfo.GetCultures(CultureTypes.SpecificCultures);
int num;
// false
bool isIntForAnyCulture = cultures
    .Any(c => int.TryParse('۵'.ToString(), NumberStyles.Any, c, out num)); 

Why is that? Is my int.Parse-precheck via Char.IsDigit thus incorrect?

There are 310 chars which are digits:

List<char> digitList = Enumerable.Range(0, UInt16.MaxValue)
   .Select(i => Convert.ToChar(i))
   .Where(c => Char.IsDigit(c))
   .ToList(); 

Here's the implementation of Char.IsDigit in .NET 4 (ILSpy):

public static bool IsDigit(char c)
{
    if (char.IsLatin1(c))
    {
        return c >= '0' && c <= '9';
    }
    return CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber;
}

So why are there chars that belong to the DecimalDigitNumber-category("Decimal digit character, that is, a character in the range 0 through 9...") which can't be parsed to an int in any culture?

解决方案

It's because it is checking for all digits in the Unicode "Number, Decimal Digit" category, as listed here:

http://www.fileformat.info/info/unicode/category/Nd/list.htm

It doesn't mean that it is a valid numeric character in the current locale. In fact using int.Parse(), you can ONLY parse the normal English digits, regardless of the locale setting.

For example, this doesn't work:

int test = int.Parse("٣", CultureInfo.GetCultureInfo("ar"));

Even though ٣ is a valid Arabic digit character, and "ar" is the Arabic locale identifier.

The Microsoft article "How to: Parse Unicode Digits" states that:

The only Unicode digits that the .NET Framework parses as decimals are the ASCII digits 0 through 9, specified by the code values U+0030 through U+0039. The .NET Framework parses all other Unicode digits as characters.

However, note that you can use char.GetNumericValue() to convert a unicode numeric character to its numeric equivalent as a double.

The reason the return value is a double and not an int is because of things like this:

Console.WriteLine(char.GetNumericValue('¼')); // Prints 0.25

You could use something like this to convert all numeric characters in a string into their ASCII equivalent:

public string ConvertNumericChars(string input)
{
    StringBuilder output = new StringBuilder();

    foreach (char ch in input)
    {
        if (char.IsDigit(ch))
        {
            double value = char.GetNumericValue(ch);

            if ((value >= 0) && (value <= 9) && (value == (int)value))
            {
                output.Append((char)('0'+(int)value));
                continue;
            }
        }

        output.Append(ch);
    }

    return output.ToString();
}

这篇关于为什么 Char.IsDigit 对于无法解析为 int 的字符返回 true?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

上一篇:如何从 C# 中的 ASCII 字符代码中获取字符 下一篇:将 Unicode 转换为用于 vCard 的 Windows-1252

相关文章

最新文章