<tfoot id='kd5sQ'></tfoot>

    1. <small id='kd5sQ'></small><noframes id='kd5sQ'>

        <bdo id='kd5sQ'></bdo><ul id='kd5sQ'></ul>
    2. <legend id='kd5sQ'><style id='kd5sQ'><dir id='kd5sQ'><q id='kd5sQ'></q></dir></style></legend>
        <i id='kd5sQ'><tr id='kd5sQ'><dt id='kd5sQ'><q id='kd5sQ'><span id='kd5sQ'><b id='kd5sQ'><form id='kd5sQ'><ins id='kd5sQ'></ins><ul id='kd5sQ'></ul><sub id='kd5sQ'></sub></form><legend id='kd5sQ'></legend><bdo id='kd5sQ'><pre id='kd5sQ'><center id='kd5sQ'></center></pre></bdo></b><th id='kd5sQ'></th></span></q></dt></tr></i><div id='kd5sQ'><tfoot id='kd5sQ'></tfoot><dl id='kd5sQ'><fieldset id='kd5sQ'></fieldset></dl></div>
      1. 确保 PHP 中的有效 UTF-8

        时间:2023-10-05
        • <bdo id='26rnK'></bdo><ul id='26rnK'></ul>

          <i id='26rnK'><tr id='26rnK'><dt id='26rnK'><q id='26rnK'><span id='26rnK'><b id='26rnK'><form id='26rnK'><ins id='26rnK'></ins><ul id='26rnK'></ul><sub id='26rnK'></sub></form><legend id='26rnK'></legend><bdo id='26rnK'><pre id='26rnK'><center id='26rnK'></center></pre></bdo></b><th id='26rnK'></th></span></q></dt></tr></i><div id='26rnK'><tfoot id='26rnK'></tfoot><dl id='26rnK'><fieldset id='26rnK'></fieldset></dl></div>
            <tbody id='26rnK'></tbody>

                  <legend id='26rnK'><style id='26rnK'><dir id='26rnK'><q id='26rnK'></q></dir></style></legend>
                • <tfoot id='26rnK'></tfoot>

                  <small id='26rnK'></small><noframes id='26rnK'>

                  本文介绍了确保 PHP 中的有效 UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  我使用 PHP 来处理来自各种来源的文本.我不认为它会是 UTF-8 以外的任何东西,ISO 8859-1 或 Windows-1252.如果不是其中之一,我只需要确保文本变成有效的 UTF-8 字符串,即使字符丢失.iconv 的//TRANSLIT 选项可以解决这个问题吗?

                  I'm using PHP to handle text from a variety of sources. I don't anticipate it will be anything other than UTF-8, ISO 8859-1, or perhaps Windows-1252. If it's anything other than one of those, I just need to make sure the text gets turned into a valid UTF-8 string, even if characters are lost. Does the //TRANSLIT option of iconv solve this?

                  例如,此代码能否确保将字符串安全地插入到 UTF-8 编码的文档(或数据库)中?

                  For example, would this code ensure that a string is safe to insert into a UTF-8 encoded document (or database)?

                  function make_safe_for_utf8_use($string) {
                  
                      $encoding = mb_detect_encoding($string, "UTF-8,ISO-8859-1,WINDOWS-1252");
                  
                      if ($encoding != 'UTF-8') {
                          return iconv($encoding, 'UTF-8//TRANSLIT', $string);
                      }
                      else {
                          return $string;
                      }
                  }
                  

                  推荐答案

                  UTF-8 可以存储任何 Unicode 字符.如果您的编码完全不同,包括 ISO-8859-1 或 Windows-1252,则 UTF-8 可以存储其中的每个字符.因此,当您将字符串从任何其他编码转换为 UTF-8 时,您不必担心丢失任何字符.

                  UTF-8 can store any Unicode character. If your encoding is anything else at all, including ISO-8859-1 or Windows-1252, UTF-8 can store every character in it. So you don't have to worry about losing any characters when you convert a string from any other encoding to UTF-8.

                  此外,ISO-8859-1 和 Windows-1252 都是单字节编码,其中任何字节都是有效的.在技​​术上无法区分它们.我会选择 Windows-1252 作为非 UTF-8 序列的默认匹配,因为唯一解码不同的字节是范围 0x80-0x9F.这些解码为各种字符,如 Windows-1252 中的智能引号和欧元,而在 ISO-8859-1 中,它们是几乎从未使用过的不可见控制字符.网络浏览器有时可能会说他们使用的是 ISO-8859-1,但通常他们真的会使用 Windows-1252.

                  Further, both ISO-8859-1 and Windows-1252 are single-byte encodings where any byte is valid. It is not technically possible to distinguish between them. I would chose Windows-1252 as your default match for non-UTF-8 sequences, as the only bytes that decode differently are the range 0x80-0x9F. These decode to various characters like smart quotes and the Euro in Windows-1252, whereas in ISO-8859-1 they are invisible control characters which are almost never used. Web browsers may sometimes say they are using ISO-8859-1, but often they will really be using Windows-1252.

                  此代码是否可以确保将字符串安全地插入到 UTF-8 编码的文档中

                  would this code ensure that a string is safe to insert into a UTF-8 encoded document

                  为此,您当然希望将可选的strict"参数设置为 TRUE.但我不确定这是否真的涵盖了所有无效的 UTF-8 序列.该函数不要求明确检查字节序列的 UTF-8 有效性.已知有 mb_detect_encoding 之前会错误地猜测 UTF-8 的情况,但我不知道在严格模式下是否仍然会发生这种情况.

                  You would certainly want to set the optional ‘strict’ parameter to TRUE for this purpose. But I'm not sure this actually covers all invalid UTF-8 sequences. The function does not claim to check a byte sequence for UTF-8 validity explicitly. There have been known cases where mb_detect_encoding would guess UTF-8 incorrectly before, though I don't know if that can still happen in strict mode.

                  如果您想确定,请使用 W3 推荐的正则表达式:

                  If you want to be sure, do it yourself using the W3-recommended regex:

                  if (preg_match('%^(?:
                        [x09x0Ax0Dx20-x7E]            # ASCII
                      | [xC2-xDF][x80-xBF]             # non-overlong 2-byte
                      | xE0[xA0-xBF][x80-xBF]         # excluding overlongs
                      | [xE1-xECxEExEF][x80-xBF]{2}  # straight 3-byte
                      | xED[x80-x9F][x80-xBF]         # excluding surrogates
                      | xF0[x90-xBF][x80-xBF]{2}      # planes 1-3
                      | [xF1-xF3][x80-xBF]{3}          # planes 4-15
                      | xF4[x80-x8F][x80-xBF]{2}      # plane 16
                  )*$%xs', $string))
                      return $string;
                  else
                      return iconv('CP1252', 'UTF-8', $string);
                  

                  这篇关于确保 PHP 中的有效 UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:土耳其语字符显示不正确 下一篇:utf-8 特殊字符不显示

                  相关文章

                  最新文章

                • <small id='AdBOi'></small><noframes id='AdBOi'>

                  <tfoot id='AdBOi'></tfoot>

                  1. <i id='AdBOi'><tr id='AdBOi'><dt id='AdBOi'><q id='AdBOi'><span id='AdBOi'><b id='AdBOi'><form id='AdBOi'><ins id='AdBOi'></ins><ul id='AdBOi'></ul><sub id='AdBOi'></sub></form><legend id='AdBOi'></legend><bdo id='AdBOi'><pre id='AdBOi'><center id='AdBOi'></center></pre></bdo></b><th id='AdBOi'></th></span></q></dt></tr></i><div id='AdBOi'><tfoot id='AdBOi'></tfoot><dl id='AdBOi'><fieldset id='AdBOi'></fieldset></dl></div>
                    <legend id='AdBOi'><style id='AdBOi'><dir id='AdBOi'><q id='AdBOi'></q></dir></style></legend>
                      <bdo id='AdBOi'></bdo><ul id='AdBOi'></ul>