如何在 jdbc 数据源中使用 dbtable 选项的子查询?

2023-04-04数据库问题

本文介绍了如何在 jdbc 数据源中使用 dbtable 选项的子查询?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我想使用 Spark 处理来自 JDBC 源的一些数据.但是首先，我想在JDBC端运行一些查询来过滤列和连接表，而不是从JDBC读取原始表，并将查询结果作为表加载到Spark SQL中.

I want to use Spark to process some data from a JDBC source. But to begin with, instead of reading original tables from JDBC, I want to run some queries on the JDBC side to filter columns and join tables, and load the query result as a table in Spark SQL.

以下加载原始 JDBC 表的语法适用于我:

The following syntax to load raw JDBC table works for me:

df_table1 = sqlContext.read.format('jdbc').options(
    url="jdbc:mysql://foo.com:3306",
    dbtable="mydb.table1",
    user="me",
    password="******",
    driver="com.mysql.jdbc.Driver" # mysql JDBC driver 5.1.41
).load() 
df_table1.show() # succeeded

根据 Spark 文档(我使用的是 PySpark 1.6.3):

According to Spark documentation (I'm using PySpark 1.6.3):

dbtable:应该读取的 JDBC 表.请注意，任何有效的可以在 SQL 查询的 FROM 子句中使用.例如，而不是完整的表，您也可以在括号中使用子查询.

dbtable: The JDBC table that should be read. Note that anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.

所以只是为了实验，我尝试了一些简单的方法:

So just for experiment, I tried something simple like this:

df_table1 = sqlContext.read.format('jdbc').options(
    url="jdbc:mysql://foo.com:3306",
    dbtable="(SELECT * FROM mydb.table1) AS table1",
    user="me",
    password="******",
    driver="com.mysql.jdbc.Driver"
).load() # failed

它抛出了以下异常:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table1 WHERE 1=0' at line 1

我还尝试了其他一些语法变体(添加/删除括号、删除as"子句、切换大小写等)，但都没有成功.那么正确的语法是什么?在哪里可以找到更详细的语法文档?此外，错误消息中这个奇怪的WHERE 1 = 0"来自哪里?谢谢！

I also tried a few other variations of the syntax (add / remove parentheses, remove 'as' clause, switch case, etc) without any luck. So what would be the correct syntax? Where can I find more detailed documentation for the syntax? Besides, where does this weird "WHERE 1=0" in error message come from? Thanks!

推荐答案

对于在 Spark SQL 中使用 sql 查询从 JDBC 源读取数据，您可以尝试如下操作:

For reading data from JDBC source using sql query in Spark SQL, you can try something like this:

val df_table1 = sqlContext.read.format("jdbc").options(Map(
    ("url" -> "jdbc:postgresql://localhost:5432/mydb"),
    ("dbtable" -> "(select * from table1) as table1"),
    ("user" -> "me"),
    ("password" -> "******"),
    ("driver" -> "org.postgresql.Driver"))
).load()

我用 PostgreSQL 试过了.可以根据MySQL修改.

I tried it using PostgreSQL. You can modify it according to MySQL.

这篇关于如何在 jdbc 数据源中使用 dbtable 选项的子查询?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！

The End

相关推荐

Mysql目录里的ibtmp1文件过大造成磁盘占满的解决办法

SQL 子句“GROUP BY 1"是什么意思?意思是?

MySQL groupwise MAX() 返回意外结果

MySQL SELECT 按组最频繁

为什么 Mysql 的 Group By 和 Oracle 的 Group by 行为不同

MySQL GROUP BY DateTime +/- 3 秒

热门文章

热门精品源码

最新VIP资源