我想创建一些处理编码的示例程序,特别是我想使用宽字符串,例如:
I want to create some sample programs that deal with encodings, specifically I want to use wide strings like:
wstring a=L"grüßen";
wstring b=L"שלום עולם!";
wstring c=L"中文";
因为这些是示例程序.
对于将源代码视为 UTF-8 编码文本的 gcc,这绝对是微不足道的.但是,直接编译在 MSVC 下不起作用.我知道我可以使用转义序列对它们进行编码,但我更愿意将它们保留为可读文本.
This is absolutely trivial with gcc that treats source code as UTF-8 encoded text. But, straightforward compilation does not work under MSVC. I know that I can encode them using escape sequences but I would prefer to keep them as readable text.
是否有任何选项可以指定为cl"的命令行开关,以便使这项工作?有没有像 gcc'c -finput-charset
这样的命令行开关?
Is there any option that I can specify as command line switch for "cl" in order to
make this work? There are there any command line switch like gcc'c -finput-charset
?
如果不是,您如何建议使文本对用户自然?
If not how would you suggest make the text natural for user?
注意:将 BOM 添加到 UTF-8 文件不是一种选择,因为它无法被其他编译器编译.
Note: adding BOM to UTF-8 file is not an option because it becomes non-compilable by other compilers.
注意 2: 我需要它在 MSVC 版本中工作 >= 9 == VS 2008
Note2: I need it to work in MSVC Version >= 9 == VS 2008
真正的答案:没有解决办法
对于那些坚持迟到总比不到好"座右铭的人,Visual Studio 2015(编译器的第 19 版)现在支持这一点.
For those who subscribe to the motto "better late than never", Visual Studio 2015 (version 19 of the compiler) now supports this.
新的 /source-charset
命令行开关允许您指定用于解释源文件的字符集编码.它需要一个参数,可以是 IANA 或ISO字符集名称:
The new /source-charset
command line switch allows you to specify the character set encoding used to interpret source files. It takes a single parameter, which can be either the IANA or ISO character set name:
/source-charset:utf-8
或特定代码页的十进制标识符(以点开头):
or the decimal identifier of a particular code page (preceded by a dot):
/source-charset:.65001
官方文档在这里,还有Visual C++ 团队博客上描述这些新选项的详细文章.
还有一个补充的/execution-charset
开关 以完全相同的方式工作,但控制在可执行文件中生成的窄字符和字符串文字.最后还有一个快捷开关,/utf-8
,设置 /source-charset:utf-8
和 /execution-charset:utf-8
.
There is also a complementary /execution-charset
switch that works in exactly the same way but controls how narrow character- and string-literals are generated in the executable. Finally, there is a shortcut switch, /utf-8
, that sets both /source-charset:utf-8
and /execution-charset:utf-8
.
这些命令行选项与旧的 #pragma setlocale
和 #pragma execution-character-set
指令不兼容,它们适用全局到所有源文件.
These command-line options are incompatible with the old #pragma setlocale
and #pragma execution-character-set
directives, and they apply globally to all source files.
对于坚持使用旧版本编译器的用户,最好的选择仍然是将源文件保存为带有 BOM 的 UTF-8(正如其他答案所建议的,IDE 可以在保存时执行此操作).编译器将自动检测到这一点并采取适当的行动.GCC 也将如此,它也在源文件的开头接受 BOM 而不会窒息,使这种方法在功能上具有可移植性.
For users stuck on older versions of the compiler, the best option is still to save your source files as UTF-8 with a BOM (as other answers have suggested, the IDE can do this when saving). The compiler will automatically detect this and behave appropriately. So, too, will GCC, which also accepts a BOM at the start of source files without choking to death, making this approach functionally portable.
这篇关于MSVC++中源字符集编码的规范,如gcc“-finput-charset=CharSet";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!