How does window tumbling works in ksql? As query returning same result with or without using window tumbling in ksql










0















I am using ksql stream and calculating events coming every 5 minutes. Here is my query -



select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;


Providing results -



2 | 2018-11-13 09:54:50
3 | 2018-11-13 09:54:49
3 | 2018-11-13 09:54:52
3 | 2018-11-13 09:54:51
3 | 2018-11-13 09:54:50


query without window tumbling -



select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;


Result -



1 | 2018-11-13 09:55:08
2 | 2018-11-13 09:55:09
1 | 2018-11-13 09:55:10
3 | 2018-11-13 09:55:09
4 | 2018-11-13 09:55:12


Both queries returning same results, so how does window tumbling make difference?










share|improve this question




























    0















    I am using ksql stream and calculating events coming every 5 minutes. Here is my query -



    select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;


    Providing results -



    2 | 2018-11-13 09:54:50
    3 | 2018-11-13 09:54:49
    3 | 2018-11-13 09:54:52
    3 | 2018-11-13 09:54:51
    3 | 2018-11-13 09:54:50


    query without window tumbling -



    select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;


    Result -



    1 | 2018-11-13 09:55:08
    2 | 2018-11-13 09:55:09
    1 | 2018-11-13 09:55:10
    3 | 2018-11-13 09:55:09
    4 | 2018-11-13 09:55:12


    Both queries returning same results, so how does window tumbling make difference?










    share|improve this question


























      0












      0








      0








      I am using ksql stream and calculating events coming every 5 minutes. Here is my query -



      select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;


      Providing results -



      2 | 2018-11-13 09:54:50
      3 | 2018-11-13 09:54:49
      3 | 2018-11-13 09:54:52
      3 | 2018-11-13 09:54:51
      3 | 2018-11-13 09:54:50


      query without window tumbling -



      select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;


      Result -



      1 | 2018-11-13 09:55:08
      2 | 2018-11-13 09:55:09
      1 | 2018-11-13 09:55:10
      3 | 2018-11-13 09:55:09
      4 | 2018-11-13 09:55:12


      Both queries returning same results, so how does window tumbling make difference?










      share|improve this question
















      I am using ksql stream and calculating events coming every 5 minutes. Here is my query -



      select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;


      Providing results -



      2 | 2018-11-13 09:54:50
      3 | 2018-11-13 09:54:49
      3 | 2018-11-13 09:54:52
      3 | 2018-11-13 09:54:51
      3 | 2018-11-13 09:54:50


      query without window tumbling -



      select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;


      Result -



      1 | 2018-11-13 09:55:08
      2 | 2018-11-13 09:55:09
      1 | 2018-11-13 09:55:10
      3 | 2018-11-13 09:55:09
      4 | 2018-11-13 09:55:12


      Both queries returning same results, so how does window tumbling make difference?







      apache-kafka ksql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 '18 at 23:44









      cricket_007

      83.5k1145115




      83.5k1145115










      asked Nov 13 '18 at 11:02









      RituRitu

      113




      113






















          1 Answer
          1






          active

          oldest

          votes


















          1














          The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column'). So you could pass created_on_date as the timestamp column and then aggregate by the values there.



          The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date).



          So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.






          share|improve this answer






















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279580%2fhow-does-window-tumbling-works-in-ksql-as-query-returning-same-result-with-or-w%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column'). So you could pass created_on_date as the timestamp column and then aggregate by the values there.



            The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date).



            So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.






            share|improve this answer



























              1














              The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column'). So you could pass created_on_date as the timestamp column and then aggregate by the values there.



              The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date).



              So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.






              share|improve this answer

























                1












                1








                1







                The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column'). So you could pass created_on_date as the timestamp column and then aggregate by the values there.



                The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date).



                So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.






                share|improve this answer













                The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column'). So you could pass created_on_date as the timestamp column and then aggregate by the values there.



                The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date).



                So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period, not over the course of an arbitrary stream of data.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 13 '18 at 11:22









                Robin MoffattRobin Moffatt

                7,9331429




                7,9331429





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279580%2fhow-does-window-tumbling-works-in-ksql-as-query-returning-same-result-with-or-w%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

                    Edmonton

                    Crossroads (UK TV series)