在注意到应用程序由于不正确的字符串值错误而倾向于丢弃随机电子邮件后,我继续并切换了许多文本列以使用 utf8 列字符集和默认列整理 (utf8_general_ci) 以便它接受它们.这修复了大部分错误,并使应用程序在遇到非拉丁电子邮件时也不再出现 sql 错误.
尽管如此,一些电子邮件仍然导致程序命中不正确的字符串值错误:(Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column '内容'在第 1) 行
内容列是一个 MEDIUMTEXT 数据类型,它使用 utf8 列字符集和 utf8_general_ci 列整理.在此列中没有我可以切换的标志.
请记住,除非绝对必要,否则我不想接触甚至查看应用程序源代码:
我考虑的一件事是切换到打开二进制标志的 utf8 varchar([some large number]),但我对 MySQL 相当不熟悉,也不知道这样的修复是否有意义.
"\xE4\xC5\xCC\xC9\xD3\xD8" 不是有效的 UTF-8.使用 Python 测试:
如果您正在寻找一种方法来避免数据库中的解码错误,cp1252 编码(又名Windows-1252"又名Windows 西欧")是最宽松的编码 - 每个字节值都是有效的代码点.
当然它不会再理解真正的 UTF-8,也不会再理解任何其他非 cp1252 编码,但听起来你不太关心这个?
After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.
Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)
The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.
Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:
One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.
"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:
>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data
If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.
Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?
这篇关于如何修复“不正确的字符串值"错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何有效地使用窗口函数根据 N 个先前值来决定How to use windowing functions efficiently to decide next N number of rows based on N number of previous values(如何有效地使用窗口函数根据
在“GROUP BY"中重用选择表达式的结果;条款reuse the result of a select expression in the quot;GROUP BYquot; clause?(在“GROUP BY中重用选择表达式的结果;条款?)
Pyspark DataFrameWriter jdbc 函数的 ignore 选项是忽略整Does ignore option of Pyspark DataFrameWriter jdbc function ignore entire transaction or just offending rows?(Pyspark DataFrameWriter jdbc 函数的 ig
使用 INSERT INTO table ON DUPLICATE KEY 时出错,使用 Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array(使用 INSERT INTO table ON DUPLICATE KEY 时出错,使用 for 循环数组
pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver(pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合适的
如何将 Apache Spark 与 MySQL 集成以将数据库表作为How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe?(如何将 Apache Spark 与 MySQL 集成以将数据库表作为