我想使用 Spark 处理来自 JDBC 源的一些数据.但是首先,我想在JDBC端运行一些查询来过滤列和连接表,而不是从JDBC读取原始表,并将查询结果作为表加载到Spark SQL中.
I want to use Spark to process some data from a JDBC source. But to begin with, instead of reading original tables from JDBC, I want to run some queries on the JDBC side to filter columns and join tables, and load the query result as a table in Spark SQL.
以下加载原始 JDBC 表的语法适用于我:
The following syntax to load raw JDBC table works for me:
df_table1 = sqlContext.read.format('jdbc').options(
url="jdbc:mysql://foo.com:3306",
dbtable="mydb.table1",
user="me",
password="******",
driver="com.mysql.jdbc.Driver" # mysql JDBC driver 5.1.41
).load()
df_table1.show() # succeeded
根据 Spark 文档(我使用的是 PySpark 1.6.3):
According to Spark documentation (I'm using PySpark 1.6.3):
dbtable:应该读取的 JDBC 表.请注意,任何有效的可以在 SQL 查询的 FROM 子句中使用.例如,而不是完整的表,您也可以在括号中使用子查询.
dbtable: The JDBC table that should be read. Note that anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.
所以只是为了实验,我尝试了一些简单的方法:
So just for experiment, I tried something simple like this:
df_table1 = sqlContext.read.format('jdbc').options(
url="jdbc:mysql://foo.com:3306",
dbtable="(SELECT * FROM mydb.table1) AS table1",
user="me",
password="******",
driver="com.mysql.jdbc.Driver"
).load() # failed
它抛出了以下异常:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table1 WHERE 1=0' at line 1
我还尝试了其他一些语法变体(添加/删除括号、删除as"子句、切换大小写等),但都没有成功.那么正确的语法是什么?在哪里可以找到更详细的语法文档?此外,错误消息中这个奇怪的WHERE 1 = 0"来自哪里?谢谢!
I also tried a few other variations of the syntax (add / remove parentheses, remove 'as' clause, switch case, etc) without any luck. So what would be the correct syntax? Where can I find more detailed documentation for the syntax? Besides, where does this weird "WHERE 1=0" in error message come from? Thanks!
对于在 Spark SQL 中使用 sql 查询从 JDBC 源读取数据,您可以尝试如下操作:
For reading data from JDBC source using sql query in Spark SQL, you can try something like this:
val df_table1 = sqlContext.read.format("jdbc").options(Map(
("url" -> "jdbc:postgresql://localhost:5432/mydb"),
("dbtable" -> "(select * from table1) as table1"),
("user" -> "me"),
("password" -> "******"),
("driver" -> "org.postgresql.Driver"))
).load()
我用 PostgreSQL 试过了.可以根据MySQL修改.
I tried it using PostgreSQL. You can modify it according to MySQL.
这篇关于如何在 jdbc 数据源中使用 dbtable 选项的子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何有效地使用窗口函数根据 N 个先前值来决定How to use windowing functions efficiently to decide next N number of rows based on N number of previous values(如何有效地使用窗口函数根据
在“GROUP BY"中重用选择表达式的结果;条款reuse the result of a select expression in the quot;GROUP BYquot; clause?(在“GROUP BY中重用选择表达式的结果;条款?)
Pyspark DataFrameWriter jdbc 函数的 ignore 选项是忽略整Does ignore option of Pyspark DataFrameWriter jdbc function ignore entire transaction or just offending rows?(Pyspark DataFrameWriter jdbc 函数的 ig
使用 INSERT INTO table ON DUPLICATE KEY 时出错,使用 Error while using INSERT INTO table ON DUPLICATE KEY, using a for loop array(使用 INSERT INTO table ON DUPLICATE KEY 时出错,使用 for 循环数组
pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver(pyspark mysql jdbc load 调用 o23.load 时发生错误 没有合适的
如何将 Apache Spark 与 MySQL 集成以将数据库表作为How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe?(如何将 Apache Spark 与 MySQL 集成以将数据库表作为