我正在使用 Apache Spark 1.5.1 并尝试连接到名为 clinton.db 的本地 SQLite 数据库.从数据库表创建数据框工作正常,但是当我对创建的对象执行某些操作时,我收到以下错误消息,其中显示SQL 错误或丢失的数据库(连接已关闭)".有趣的是,我还是得到了手术的结果.知道我可以做些什么来解决问题,即避免错误吗?
I am using Apache Spark 1.5.1 and trying to connect to a local SQLite database named clinton.db. Creating a data frame from a table of the database works fine but when I do some operations on the created object, I get the error below which says "SQL error or missing database (Connection is closed)". Funny thing is that I get the result of the operation nevertheless. Any idea what I can do to solve the problem, i.e., avoid the error?
spark-shell 的启动命令:
Start command for spark-shell:
../spark/bin/spark-shell --master local[8] --jars ../libraries/sqlite-jdbc-3.8.11.1.jar --classpath ../libraries/sqlite-jdbc-3.8.11.1.jar
从数据库中读取:
val emails = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlite:../data/clinton.sqlite", "dbtable" -> "Emails")).load()
简单计数(失败):
emails.count
错误:
15/09/30 09:06:39 WARN JDBCRDD:异常结束语句java.sql.SQLException: [SQLITE_ERROR] SQL 错误或缺少数据库(连接已关闭)在 org.sqlite.core.DB.newSQLException(DB.java:890)在 org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)在 org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)在 org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)在 org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)在 org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)在 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)在 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)在 org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)在 org.apache.spark.scheduler.Task.run(Task.scala:90)在 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)在 java.lang.Thread.run(Thread.java:745)res1:长 = 7945
我遇到了同样的错误 今天,并且重要的一行就在异常之前:
I got the same error today, and the important line is just before the exception:
15/11/30 12:13:02 INFO jdbc.JDBCRDD:关闭连接
15/11/30 12:13:02 INFO jdbc.JDBCRDD: closed connection
15/11/30 12:13:02 WARN jdbc.JDBCRDD:异常结束语句java.sql.SQLException: [SQLITE_ERROR] SQL 错误或缺少数据库(连接已关闭)在 org.sqlite.core.DB.newSQLException(DB.java:890)在 org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)在 org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)
15/11/30 12:13:02 WARN jdbc.JDBCRDD: Exception closing statement java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed) at org.sqlite.core.DB.newSQLException(DB.java:890) at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109) at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)
所以Spark成功关闭JDBC连接,然后关闭JDBC语句
So Spark succeeded to close the JDBC connection, and then it fails to close the JDBC statement
看源码,close()被调用了两次:
第 358 行(org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD,Spark 1.5.1)
Line 358 (org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, Spark 1.5.1)
context.addTaskCompletionListener{ context => close() }
第 469 行
override def hasNext: Boolean = {
if (!finished) {
if (!gotNext) {
nextValue = getNext()
if (finished) {
close()
}
gotNext = true
}
}
!finished
}
如果您查看 close() 方法(第 443 行)
If you look at the close() method (line 443)
def close() {
if (closed) return
您可以看到它检查了变量 closed,但该值从未设置为 true.
you can see that it checks the variable closed, but that value is never set to true.
如果我没看错的话,这个bug还在master里面.我已提交错误报告.
If I see it correctly, this bug is still in the master. I have filed a bug report.
这篇关于SQLITE_ERROR:通过 JDBC 从 Spark 连接到 SQLite 数据库时,连接已关闭的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何有效地使用窗口函数根据 N 个先前值来决定How to use windowing functions efficiently to decide next N number of rows based on N number of previous values(如何有效地使用窗口函数根据
在“GROUP BY"中重用选择表达式的结果;条款reuse the result of a select expression in the quot;GROUP BYquot; clause?(在“GROUP BY中重用选择表达式的结果;条款?)
Pyspark DataFrameWriter jdbc 函数的 ignore 选项是忽略整Does ignore option of Pyspark DataFrameWriter jdbc function ignore entire transaction or just offending rows?(Pyspark DataFrameWriter jdbc 函数的 ig
pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver(pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合适的
如何将 Apache Spark 与 MySQL 集成以将数据库表作为How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe?(如何将 Apache Spark 与 MySQL 集成以将数据库表作为
在 Apache Spark 2.0.0 中,是否可以从外部数据库获取In Apache Spark 2.0.0, is it possible to fetch a query from an external database (rather than grab the whole table)?(在 Apache Spark 2.0.0 中,是否可