1. <i id='shLLP'><tr id='shLLP'><dt id='shLLP'><q id='shLLP'><span id='shLLP'><b id='shLLP'><form id='shLLP'><ins id='shLLP'></ins><ul id='shLLP'></ul><sub id='shLLP'></sub></form><legend id='shLLP'></legend><bdo id='shLLP'><pre id='shLLP'><center id='shLLP'></center></pre></bdo></b><th id='shLLP'></th></span></q></dt></tr></i><div id='shLLP'><tfoot id='shLLP'></tfoot><dl id='shLLP'><fieldset id='shLLP'></fieldset></dl></div>
      <tfoot id='shLLP'></tfoot>
      <legend id='shLLP'><style id='shLLP'><dir id='shLLP'><q id='shLLP'></q></dir></style></legend>

      <small id='shLLP'></small><noframes id='shLLP'>

          <bdo id='shLLP'></bdo><ul id='shLLP'></ul>

        在spark sql中转换两个数据帧

        时间:2023-08-20

          <small id='ZDe2z'></small><noframes id='ZDe2z'>

          <i id='ZDe2z'><tr id='ZDe2z'><dt id='ZDe2z'><q id='ZDe2z'><span id='ZDe2z'><b id='ZDe2z'><form id='ZDe2z'><ins id='ZDe2z'></ins><ul id='ZDe2z'></ul><sub id='ZDe2z'></sub></form><legend id='ZDe2z'></legend><bdo id='ZDe2z'><pre id='ZDe2z'><center id='ZDe2z'></center></pre></bdo></b><th id='ZDe2z'></th></span></q></dt></tr></i><div id='ZDe2z'><tfoot id='ZDe2z'></tfoot><dl id='ZDe2z'><fieldset id='ZDe2z'></fieldset></dl></div>

            • <tfoot id='ZDe2z'></tfoot>
                <tbody id='ZDe2z'></tbody>
              <legend id='ZDe2z'><style id='ZDe2z'><dir id='ZDe2z'><q id='ZDe2z'></q></dir></style></legend>

                • <bdo id='ZDe2z'></bdo><ul id='ZDe2z'></ul>
                  本文介绍了在spark sql中转换两个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  我在 spark scala 中有两个数据框注册为表.从这两个表

                  I am having two dataframes in spark scala registered as tables. From these two tables

                  表 1:

                     +-----+--------+
                     |id   |values  |
                     +-----+-----   +
                     |   0 |  v1    |
                     |   0 |  v2    |
                     |   1 |  v3    |
                     |   1 |  v1    |
                     +-----+-----   +
                  

                  表 2:

                     +-----+----+--- +----+
                     |id   |v1  |v2  | v3
                     +-----+-------- +----+
                     |   0 |  a1|  b1| -  |
                     |   1 |  a2|  - | c2 |
                  
                     +-----+---------+----+   
                  

                  我想用上面两个表生成一个新表.

                  I want to generate a new table using the above two tables.

                  表 3:

                     +-----+--------+--------+
                     |id   |values  | field  |
                     +-----+--------+--------+
                     |   0 |  v1    | a1     |
                     |   0 |  v2    | b1     |
                     |   1 |  v3    | c2     |
                     |   1 |  v1    | a2     |
                     +-----+--------+--------+
                  

                  这里 v1 的形式是

                  Here v1 is of the form

                   v1: struct (nullable = true)
                      |    |-- level1: string (nullable = true)
                      |    |-- level2: string (nullable = true)
                      |    |-- level3: string (nullable = true)
                      |    |-- level4: string (nullable = true)
                      |    |-- level5: string (nullable = true)
                  

                  我在 scala 中使用 spark sql.

                  I am using spark sql in scala .

                  是否可以通过在数据帧上编写一些 sql 查询或使用一些 spark 函数来完成所需的操作.

                  Is it possible to do the desired thing by writing some sql query or using some spark functions on dataframes.

                  推荐答案

                  这是您可以使用的示例代码,它将生成此输出:

                  Here is the sample code that you can use , that will generate this output :

                  代码如下:

                  val df1=sc.parallelize(Seq((0,"v1"),(0,"v2"),(1,"v3"),(1,"v1"))).toDF("id","values")
                  val df2=sc.parallelize(Seq((0,"a1","b1","-"),(1,"a2","-","b2"))).toDF("id","v1","v2","v3")
                  val joinedDF=df1.join(df2,"id")
                  val resultDF=joinedDF.rdd.map{row=>
                  val id=row.getAs[Int]("id")
                  val values=row.getAs[String]("values")
                  val feilds=row.getAs[String](values)
                  (id,values,feilds)
                  }.toDF("id","values","feilds")
                  

                  在控制台上测试时:

                  scala> val df1=sc.parallelize(Seq((0,"v1"),(0,"v2"),(1,"v3"),(1,"v1"))).toDF("id","values")
                  df1: org.apache.spark.sql.DataFrame = [id: int, values: string]
                  
                  scala> df1.show
                  +---+------+
                  | id|values|
                  +---+------+
                  |  0|    v1|
                  |  0|    v2|
                  |  1|    v3|
                  |  1|    v1|
                  +---+------+
                  
                  
                  scala> val df2=sc.parallelize(Seq((0,"a1","b1","-"),(1,"a2","-","b2"))).toDF("id","v1","v2","v3")
                  df2: org.apache.spark.sql.DataFrame = [id: int, v1: string ... 2 more fields]
                  
                  scala> df2.show
                  +---+---+---+---+
                  | id| v1| v2| v3|
                  +---+---+---+---+
                  |  0| a1| b1|  -|
                  |  1| a2|  -| b2|
                  +---+---+---+---+
                  
                  
                  scala> val joinedDF=df1.join(df2,"id")
                  joinedDF: org.apache.spark.sql.DataFrame = [id: int, values: string ... 3 more fields]
                  
                  scala> joinedDF.show
                  +---+------+---+---+---+                                                        
                  | id|values| v1| v2| v3|
                  +---+------+---+---+---+
                  |  1|    v3| a2|  -| b2|
                  |  1|    v1| a2|  -| b2|
                  |  0|    v1| a1| b1|  -|
                  |  0|    v2| a1| b1|  -|
                  +---+------+---+---+---+
                  
                  
                  scala> val resultDF=joinedDF.rdd.map{row=>
                       | val id=row.getAs[Int]("id")
                       | val values=row.getAs[String]("values")
                       | val feilds=row.getAs[String](values)
                       | (id,values,feilds)
                       | }.toDF("id","values","feilds")
                  resultDF: org.apache.spark.sql.DataFrame = [id: int, values: string ... 1 more field]
                  
                  scala> 
                  
                  scala> resultDF.show
                  +---+------+------+                                                             
                  | id|values|feilds|
                  +---+------+------+
                  |  1|    v3|    b2|
                  |  1|    v1|    a2|
                  |  0|    v1|    a1|
                  |  0|    v2|    b1|
                  +---+------+------+
                  

                  我希望这可能是您的问题.谢谢!

                  I hope this might your problem. Thanks!

                  这篇关于在spark sql中转换两个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:MySQL 中的多次更新 下一篇:SQLITE_ERROR:通过 JDBC 从 Spark 连接到 SQLite 数据库时

                  相关文章

                  最新文章

                    • <bdo id='rZjtB'></bdo><ul id='rZjtB'></ul>
                    <tfoot id='rZjtB'></tfoot>
                  1. <i id='rZjtB'><tr id='rZjtB'><dt id='rZjtB'><q id='rZjtB'><span id='rZjtB'><b id='rZjtB'><form id='rZjtB'><ins id='rZjtB'></ins><ul id='rZjtB'></ul><sub id='rZjtB'></sub></form><legend id='rZjtB'></legend><bdo id='rZjtB'><pre id='rZjtB'><center id='rZjtB'></center></pre></bdo></b><th id='rZjtB'></th></span></q></dt></tr></i><div id='rZjtB'><tfoot id='rZjtB'></tfoot><dl id='rZjtB'><fieldset id='rZjtB'></fieldset></dl></div>

                      <small id='rZjtB'></small><noframes id='rZjtB'>

                      <legend id='rZjtB'><style id='rZjtB'><dir id='rZjtB'><q id='rZjtB'></q></dir></style></legend>