<i id='sKaUx'><tr id='sKaUx'><dt id='sKaUx'><q id='sKaUx'><span id='sKaUx'><b id='sKaUx'><form id='sKaUx'><ins id='sKaUx'></ins><ul id='sKaUx'></ul><sub id='sKaUx'></sub></form><legend id='sKaUx'></legend><bdo id='sKaUx'><pre id='sKaUx'><center id='sKaUx'></center></pre></bdo></b><th id='sKaUx'></th></span></q></dt></tr></i><div id='sKaUx'><tfoot id='sKaUx'></tfoot><dl id='sKaUx'><fieldset id='sKaUx'></fieldset></dl></div>
      <bdo id='sKaUx'></bdo><ul id='sKaUx'></ul>

  • <small id='sKaUx'></small><noframes id='sKaUx'>

    1. <tfoot id='sKaUx'></tfoot>
    2. <legend id='sKaUx'><style id='sKaUx'><dir id='sKaUx'><q id='sKaUx'></q></dir></style></legend>

        在 SparkSQL 中使用窗口函数 (dense_rank()) 进行选择

        时间:2023-08-20

          <tbody id='5zbEL'></tbody>

        • <bdo id='5zbEL'></bdo><ul id='5zbEL'></ul>

            <small id='5zbEL'></small><noframes id='5zbEL'>

            <tfoot id='5zbEL'></tfoot>
          1. <legend id='5zbEL'><style id='5zbEL'><dir id='5zbEL'><q id='5zbEL'></q></dir></style></legend>

            • <i id='5zbEL'><tr id='5zbEL'><dt id='5zbEL'><q id='5zbEL'><span id='5zbEL'><b id='5zbEL'><form id='5zbEL'><ins id='5zbEL'></ins><ul id='5zbEL'></ul><sub id='5zbEL'></sub></form><legend id='5zbEL'></legend><bdo id='5zbEL'><pre id='5zbEL'><center id='5zbEL'></center></pre></bdo></b><th id='5zbEL'></th></span></q></dt></tr></i><div id='5zbEL'><tfoot id='5zbEL'></tfoot><dl id='5zbEL'><fieldset id='5zbEL'></fieldset></dl></div>

                  本文介绍了在 SparkSQL 中使用窗口函数 (dense_rank()) 进行选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  我有一个包含客户购买记录的表格,我需要指定购买是在特定日期时间窗口内进行的,一个窗口是 8 天,所以如果我今天购买了 5 天内购买了一次,那么如果窗口号是我的购买1,但如果我在今天的第一天和 8 天后的第二天这样做,第一次购买将在窗口 1 中,最后一次购买将在窗口 2 中

                  I have a table which contains records for customer purchases, I need to specify that purchase was made in specific datetime window one window is 8 days , so if I had purchase today and one in 5 days its mean my purchase if window number 1, but if I did it on day one today and next in 8 days, first purchase will be in window 1 and the last purchase in window 2

                  create temporary table transactions
                   (client_id int,
                   transaction_ts datetime,
                   store_id int)
                  
                   insert into transactions values 
                   (1,'2018-06-01 12:17:37', 1),
                   (1,'2018-06-02 13:17:37', 2),
                   (1,'2018-06-03 14:17:37', 3),
                   (1,'2018-06-09 10:17:37', 2),
                   (2,'2018-06-02 10:17:37', 1),
                   (2,'2018-06-02 13:17:37', 2),
                   (2,'2018-06-08 14:19:37', 3),
                   (2,'2018-06-16 13:17:37', 2),
                   (2,'2018-06-17 14:17:37', 3)
                  

                  窗口是8天,问题是我不明白如何指定dense_rank() OVER (PARTITION BY)查看日期时间并在8天内制作一个窗口,结果我需要这样的东西

                  the window is 8 days, the problem is I don't understand how to specify for dense_rank() OVER (PARTITION BY) to look at datetime and make a window in 8 days, as result I need something like this

                  1,'2018-06-01 12:17:37', 1,1
                  1,'2018-06-02 13:17:37', 2,1
                  1,'2018-06-03 14:17:37', 3,1
                  1,'2018-06-09 10:17:37', 2,2
                  2,'2018-06-02 10:17:37', 1,1
                  2,'2018-06-02 13:17:37', 2,1
                  2,'2018-06-08 14:19:37', 3,2
                  2,'2018-06-16 13:17:37', 2,3
                  2,'2018-06-17 14:17:37', 3,3
                  

                  知道如何获得它吗?我可以在 Mysql 或 Spark SQL 中运行它,但 Mysql 不支持分区.还是找不到解决办法!任何帮助

                  any idea how to get it? I can run it in Mysql or Spark SQL, but Mysql doesn't support partition. Still cannot find solution! any help

                  推荐答案

                  很可能你可以在 Spark SQL 中使用时间和分区窗口函数来解决这个问题:

                  Most likely you may solve this in Spark SQL using time and partition window functions:

                  val purchases = Seq((1,"2018-06-01 12:17:37", 1), (1,"2018-06-02 13:17:37", 2), (1,"2018-06-03 14:17:37", 3), (1,"2018-06-09 10:17:37", 2), (2,"2018-06-02 10:17:37", 1), (2,"2018-06-02 13:17:37", 2), (2,"2018-06-08 14:19:37", 3), (2,"2018-06-16 13:17:37", 2), (2,"2018-06-17 14:17:37", 3)).toDF("client_id", "transaction_ts", "store_id")
                  
                  purchases.show(false)
                  +---------+-------------------+--------+
                  |client_id|transaction_ts     |store_id|
                  +---------+-------------------+--------+
                  |1        |2018-06-01 12:17:37|1       |
                  |1        |2018-06-02 13:17:37|2       |
                  |1        |2018-06-03 14:17:37|3       |
                  |1        |2018-06-09 10:17:37|2       |
                  |2        |2018-06-02 10:17:37|1       |
                  |2        |2018-06-02 13:17:37|2       |
                  |2        |2018-06-08 14:19:37|3       |
                  |2        |2018-06-16 13:17:37|2       |
                  |2        |2018-06-17 14:17:37|3       |
                  +---------+-------------------+--------+
                  
                  
                  
                  val groupedByTimeWindow = purchases.groupBy($"client_id", window($"transaction_ts", "8 days")).agg(collect_list("transaction_ts").as("transaction_tss"), collect_list("store_id").as("store_ids"))
                  
                  val withWindowNumber = groupedByTimeWindow.withColumn("window_number", row_number().over(windowByClient))
                  
                  withWindowNumber.orderBy("client_id", "window.start").show(false)
                  
                      +---------+---------------------------------------------+---------------------------------------------------------------+---------+-------------+
                  |client_id|window                                       |transaction_tss                                                |store_ids|window_number|
                  +---------+---------------------------------------------+---------------------------------------------------------------+---------+-------------+
                  |1        |[2018-05-28 17:00:00.0,2018-06-05 17:00:00.0]|[2018-06-01 12:17:37, 2018-06-02 13:17:37, 2018-06-03 14:17:37]|[1, 2, 3]|1            |
                  |1        |[2018-06-05 17:00:00.0,2018-06-13 17:00:00.0]|[2018-06-09 10:17:37]                                          |[2]      |2            |
                  |2        |[2018-05-28 17:00:00.0,2018-06-05 17:00:00.0]|[2018-06-02 10:17:37, 2018-06-02 13:17:37]                     |[1, 2]   |1            |
                  |2        |[2018-06-05 17:00:00.0,2018-06-13 17:00:00.0]|[2018-06-08 14:19:37]                                          |[3]      |2            |
                  |2        |[2018-06-13 17:00:00.0,2018-06-21 17:00:00.0]|[2018-06-16 13:17:37, 2018-06-17 14:17:37]                     |[2, 3]   |3            |
                  +---------+---------------------------------------------+---------------------------------------------------------------+---------+-------------+
                  

                  如果需要,您可以explode 列出 store_ids 或 transaction_tss 中的元素.

                  If you need, you may explode list elements from store_ids or transaction_tss.

                  希望能帮到你!

                  这篇关于在 SparkSQL 中使用窗口函数 (dense_rank()) 进行选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:SQLITE_ERROR:通过 JDBC 从 Spark 连接到 SQLite 数据库时 下一篇:SQLITE_ERROR:通过 JDBC 从 Spark 连接到 SQLite 数据库时

                  相关文章

                  最新文章

                  <tfoot id='uq6ex'></tfoot>
                1. <i id='uq6ex'><tr id='uq6ex'><dt id='uq6ex'><q id='uq6ex'><span id='uq6ex'><b id='uq6ex'><form id='uq6ex'><ins id='uq6ex'></ins><ul id='uq6ex'></ul><sub id='uq6ex'></sub></form><legend id='uq6ex'></legend><bdo id='uq6ex'><pre id='uq6ex'><center id='uq6ex'></center></pre></bdo></b><th id='uq6ex'></th></span></q></dt></tr></i><div id='uq6ex'><tfoot id='uq6ex'></tfoot><dl id='uq6ex'><fieldset id='uq6ex'></fieldset></dl></div>

                  1. <legend id='uq6ex'><style id='uq6ex'><dir id='uq6ex'><q id='uq6ex'></q></dir></style></legend>

                    <small id='uq6ex'></small><noframes id='uq6ex'>

                        <bdo id='uq6ex'></bdo><ul id='uq6ex'></ul>