我也有带有变音符号的 UTF-8 文本,想检查该文本的第一个字母是大写还是小写.如何做到这一点?
I have texts in UTF-8 with diacritic characters also, and would like to check if first letter of this text is upper case or lower case. How to do this?
我认为,与此处发布的其他解决方案相比,进行 preg_
调用是最直接、简洁和可靠的调用.
It is my opinion that making a preg_
call is the most direct, concise, and reliable call versus the other posted solutions here.
echo preg_match('~^p{Lu}~u', $string) ? 'upper' : 'lower';
我的模式分解:
~ # starting pattern delimiter
^ #match from the start of the input string
p{Lu} #match exactly one uppercase letter (unicode safe)
~ #ending pattern delimiter
u #enable unicode matching
ctype_
和 <时请注意'a'
在这一系列测试中失败了.
Please take notice when ctype_
and < 'a'
fail with this battery of tests.
代码:(演示)
$tests = ['âa', 'Bbbbb', 'Éé', 'iou', 'Δδ'];
foreach ($tests as $test) {
echo "
{$test}:";
echo "
PREG: " , preg_match('~^p{Lu}~u', $test) ? 'upper' : 'lower';
echo "
CTYPE: " , ctype_upper(mb_substr($test, 0, 1)) ? 'upper' : 'lower';
echo "
< a: " , mb_substr($test, 0, 1) < 'a' ? 'upper' : 'lower';
$chr = mb_substr ($test, 0, 1, "UTF-8");
echo "
MB: " , mb_strtoupper($chr, "UTF-8") == $chr ? 'upper' : 'lower';
}
输出:
âa:
PREG: lower
CTYPE: lower
< a: lower
MB: lower
Bbbbb:
PREG: upper
CTYPE: upper
< a: upper
MB: upper
Éé: <-- trouble
PREG: upper
CTYPE: lower <-- uh oh
< a: lower <-- uh oh
MB: upper
iou:
PREG: lower
CTYPE: lower
< a: lower
MB: lower
Δδ: <-- extended beyond question scope
PREG: upper <-- still holding up
CTYPE: lower
< a: lower
MB: upper <-- still holding up
如果有人需要区分大写字母、小写字母和非字母,请参阅这篇文章.
If anyone needs to differentiate between uppercase letters, lowercase letters, and non-letters see this post.
这可能把这个问题的范围扩展得太远了,但如果你输入的字符特别松散(它们可能不存在于Lu
可以处理的类别中),你可能需要检查一下第一个字符有大小写变体:
It may be extending the scope of this question too far, but if your input characters are especially squirrelly (they might not exist in a category that Lu
can handle), you may want to check if the first character has case variants:
p{L&} 或 p{Cased_Letter}:存在大小写变体的字母(Ll、Lu 和 Lt 的组合).
p{L&} or p{Cased_Letter}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).
要包含带有 SMALL
变体的罗马数字(数字字母"),如有必要,您可以将该额外范围添加到模式中.
To include Roman Numerals ("Number Letters") with SMALL
variants, you can add that extra range to the pattern if necessary.
https://www.fileformat.info/info/unicode/category/Nl/list.htm
代码:(演示)
echo preg_match('~^[p{Lu}x{2160}-x{216F}]~u', $test) ? 'upper' : 'not upper';
这篇关于PHP中如何判断字母是大写还是小写?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!