我想在我的 Glue 作业中从 Aurora-rds mysql 表创建一个 DynamicFrame.我可以使用自定义查询从我的 rds 表创建 DynamicFrame - 有 where 子句吗?我不想每次都在我的 DynamicFrame 中读取整个表格,然后再进行过滤.看了这个网站,但没有在这里或其他地方找到任何选项,https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html
I want to create a DynamicFrame in my Glue job from an Aurora-rds mysql table. Can I create DynamicFrame from my rds table using a custom query - having a where clause? I dont want to read the entire table every time in my DynamicFrame and then filter later. Looked at this website but didnt find any option here or elsewhere, https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html
构造 JDBC 连接选项
connection_mysql5_options = {"url": "jdbc:mysql://:3306/db","dbtable": "测试",用户":管理员","密码": "密码"}
connection_mysql5_options = { "url": "jdbc:mysql://:3306/db", "dbtable": "test", "user": "admin", "password": "pwd"}
从 MySQL 5 读取 DynamicFrame
df_mysql5 =glueContext.create_dynamic_frame.from_options(connection_type="mysql",connection_options=connection_mysql5_options)
df_mysql5 = glueContext.create_dynamic_frame.from_options(connection_type="mysql", connection_options=connection_mysql5_options)
有什么方法可以给出一个 where 子句并说只从测试表中选择前 100 行,说它有一个名为id"的列,我想使用这个查询来获取:
Is there any way to give a where clause and say select only top 100 rows from test table, say it has a column named "id" and I want to fetch using this query:
select * from test where id<100;
select * from test where id<100;
感谢任何帮助.谢谢!
抱歉,我本来可以发表评论的,但我没有足够的声誉.我能够在 AWS Glue 中使用 Guillermo AMS 提供的解决方案,但它确实需要两个更改:
Apologies, I would have made a comment but I do not have sufficient reputation. I was able to make the solution that Guillermo AMS provided work within AWS Glue, but it did require two changes:
在我下面的解决方案中,我还围绕所需的对象和导入添加了一些上下文.
我的解决方案最终看起来像:
In my solution below I have also added a bit of context around the needed objects and imports.
My solution ended up looking like:
from awsglue.context import GlueContext
from pyspark.context import SparkContext
glue_context = GlueContext(SparkContext.getOrCreate())
tmp_data_frame = glue_context.spark_session.read\
.format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")\
.option("url", jdbc_url)\
.option("user", username)\
.option("password", password)\
.option("dbtable", "(select * from test where id<100)")\
.load()
这篇关于从选项(来自 rds - mysql)创建动态框架,提供带有 where 子句的自定义查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!