Python Pandas Create New Column with Groupby().Sum()










38















Trying to create a new column with the groupby calculation. In the code below, I get the correct calculated values for each date (see group below) but when I try to create a new column (df['Data4']) with it I get NaN. So I am trying to create a new column in the dataframe with the sum of 'Data3' for the all dates and apply that to each date row. For example, 2015-05-08 is in 2 rows (total is 50+5 = 55) and in this new column I would like to have 55 in both of the rows.



import pandas as pd
import numpy as np
from pandas import DataFrame


df = pd.DataFrame('Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40],'Data3': [5, 8, 6, 1, 50, 100, 60, 120])

group = df['Data3'].groupby(df['Date']).sum()

df['Data4'] = group









share|improve this question


























    38















    Trying to create a new column with the groupby calculation. In the code below, I get the correct calculated values for each date (see group below) but when I try to create a new column (df['Data4']) with it I get NaN. So I am trying to create a new column in the dataframe with the sum of 'Data3' for the all dates and apply that to each date row. For example, 2015-05-08 is in 2 rows (total is 50+5 = 55) and in this new column I would like to have 55 in both of the rows.



    import pandas as pd
    import numpy as np
    from pandas import DataFrame


    df = pd.DataFrame('Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40],'Data3': [5, 8, 6, 1, 50, 100, 60, 120])

    group = df['Data3'].groupby(df['Date']).sum()

    df['Data4'] = group









    share|improve this question
























      38












      38








      38


      17






      Trying to create a new column with the groupby calculation. In the code below, I get the correct calculated values for each date (see group below) but when I try to create a new column (df['Data4']) with it I get NaN. So I am trying to create a new column in the dataframe with the sum of 'Data3' for the all dates and apply that to each date row. For example, 2015-05-08 is in 2 rows (total is 50+5 = 55) and in this new column I would like to have 55 in both of the rows.



      import pandas as pd
      import numpy as np
      from pandas import DataFrame


      df = pd.DataFrame('Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40],'Data3': [5, 8, 6, 1, 50, 100, 60, 120])

      group = df['Data3'].groupby(df['Date']).sum()

      df['Data4'] = group









      share|improve this question














      Trying to create a new column with the groupby calculation. In the code below, I get the correct calculated values for each date (see group below) but when I try to create a new column (df['Data4']) with it I get NaN. So I am trying to create a new column in the dataframe with the sum of 'Data3' for the all dates and apply that to each date row. For example, 2015-05-08 is in 2 rows (total is 50+5 = 55) and in this new column I would like to have 55 in both of the rows.



      import pandas as pd
      import numpy as np
      from pandas import DataFrame


      df = pd.DataFrame('Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40],'Data3': [5, 8, 6, 1, 50, 100, 60, 120])

      group = df['Data3'].groupby(df['Date']).sum()

      df['Data4'] = group






      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked May 14 '15 at 18:44









      fe nerfe ner

      209136




      209136






















          2 Answers
          2






          active

          oldest

          votes


















          129














          You want to use transform this will return a Series with the index aligned to the df so you can then add it as a new column:



          In [74]:

          df = pd.DataFrame('Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40],'Data3': [5, 8, 6, 1, 50, 100, 60, 120])

          df['Data4'] = df['Data3'].groupby(df['Date']).transform('sum')
          df
          Out[74]:
          Data2 Data3 Date Sym Data4
          0 11 5 2015-05-08 aapl 55
          1 8 8 2015-05-07 aapl 108
          2 10 6 2015-05-06 aapl 66
          3 15 1 2015-05-05 aapl 121
          4 110 50 2015-05-08 aaww 55
          5 60 100 2015-05-07 aaww 108
          6 100 60 2015-05-06 aaww 66
          7 40 120 2015-05-05 aaww 121





          share|improve this answer























          • What happens if we have a second groupby as in here: stackoverflow.com/a/40067099/281545

            – Mr_and_Mrs_D
            May 5 '18 at 20:40











          • @Mr_and_Mrs_D you'd have to reset the index and perform a left merge on the common columns in that case to add the column back

            – EdChum
            May 5 '18 at 20:56






          • 3





            Alternatively, one can use df.groupby('Date')['Data3'].transform('sum') (which I find slightly easier to remember).

            – Cleb
            Aug 24 '18 at 11:32


















          2














          I stumbled upon an interesting idiosyncrasy in the API. It seems like you consistently can shave off a few milliseconds of the time taken by transform if you instead use a direct function of GroupBy and broadcast it using map:



          df
          Date Sym Data2 Data3
          0 2015-05-08 aapl 11 5
          1 2015-05-07 aapl 8 8
          2 2015-05-06 aapl 10 6
          3 2015-05-05 aapl 15 1
          4 2015-05-08 aaww 110 50
          5 2015-05-07 aaww 60 100
          6 2015-05-06 aaww 100 60
          7 2015-05-05 aaww 40 120




          df.Date.map(df.groupby('Date')['Data3'].sum())

          0 55
          1 108
          2 66
          3 121
          4 55
          5 108
          6 66
          7 121
          Name: Date, dtype: int64


          Compare with



          df.groupby('Date')['Data3'].transform('sum')

          0 55
          1 108
          2 66
          3 121
          4 55
          5 108
          6 66
          7 121
          Name: Data3, dtype: int64



          My tests show that map is a bit faster if you can afford to use the direct GroupBy function (such as mean, min, max, first, etc). It is more or less faster for most general situations upto around ~200 thousand records. After that, the performance really depends on the data.



          enter image description here



          I would say this is a nice alternative to know, and is better if you have smaller frames with smaller numbers of groups, but I would recommend transform as a first choice. Thought this was worth sharing anyway.



          Benchmarking code, for reference:



          import perfplot

          perfplot.show(
          setup=lambda n: pd.DataFrame('A': np.random.choice(n//10, n), 'B': np.ones(n)),
          kernels=[
          lambda df: df.groupby('A')['B'].transform('sum'),
          lambda df: df.A.map(df.groupby('A')['B'].sum()),
          ],
          labels=['GroupBy.transform', 'GroupBy.sum + map'],
          n_range=[2**k for k in range(5, 20)],
          xlabel='N',
          logy=True,
          logx=True
          )





          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f30244952%2fpython-pandas-create-new-column-with-groupby-sum%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            129














            You want to use transform this will return a Series with the index aligned to the df so you can then add it as a new column:



            In [74]:

            df = pd.DataFrame('Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40],'Data3': [5, 8, 6, 1, 50, 100, 60, 120])

            df['Data4'] = df['Data3'].groupby(df['Date']).transform('sum')
            df
            Out[74]:
            Data2 Data3 Date Sym Data4
            0 11 5 2015-05-08 aapl 55
            1 8 8 2015-05-07 aapl 108
            2 10 6 2015-05-06 aapl 66
            3 15 1 2015-05-05 aapl 121
            4 110 50 2015-05-08 aaww 55
            5 60 100 2015-05-07 aaww 108
            6 100 60 2015-05-06 aaww 66
            7 40 120 2015-05-05 aaww 121





            share|improve this answer























            • What happens if we have a second groupby as in here: stackoverflow.com/a/40067099/281545

              – Mr_and_Mrs_D
              May 5 '18 at 20:40











            • @Mr_and_Mrs_D you'd have to reset the index and perform a left merge on the common columns in that case to add the column back

              – EdChum
              May 5 '18 at 20:56






            • 3





              Alternatively, one can use df.groupby('Date')['Data3'].transform('sum') (which I find slightly easier to remember).

              – Cleb
              Aug 24 '18 at 11:32















            129














            You want to use transform this will return a Series with the index aligned to the df so you can then add it as a new column:



            In [74]:

            df = pd.DataFrame('Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40],'Data3': [5, 8, 6, 1, 50, 100, 60, 120])

            df['Data4'] = df['Data3'].groupby(df['Date']).transform('sum')
            df
            Out[74]:
            Data2 Data3 Date Sym Data4
            0 11 5 2015-05-08 aapl 55
            1 8 8 2015-05-07 aapl 108
            2 10 6 2015-05-06 aapl 66
            3 15 1 2015-05-05 aapl 121
            4 110 50 2015-05-08 aaww 55
            5 60 100 2015-05-07 aaww 108
            6 100 60 2015-05-06 aaww 66
            7 40 120 2015-05-05 aaww 121





            share|improve this answer























            • What happens if we have a second groupby as in here: stackoverflow.com/a/40067099/281545

              – Mr_and_Mrs_D
              May 5 '18 at 20:40











            • @Mr_and_Mrs_D you'd have to reset the index and perform a left merge on the common columns in that case to add the column back

              – EdChum
              May 5 '18 at 20:56






            • 3





              Alternatively, one can use df.groupby('Date')['Data3'].transform('sum') (which I find slightly easier to remember).

              – Cleb
              Aug 24 '18 at 11:32













            129












            129








            129







            You want to use transform this will return a Series with the index aligned to the df so you can then add it as a new column:



            In [74]:

            df = pd.DataFrame('Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40],'Data3': [5, 8, 6, 1, 50, 100, 60, 120])

            df['Data4'] = df['Data3'].groupby(df['Date']).transform('sum')
            df
            Out[74]:
            Data2 Data3 Date Sym Data4
            0 11 5 2015-05-08 aapl 55
            1 8 8 2015-05-07 aapl 108
            2 10 6 2015-05-06 aapl 66
            3 15 1 2015-05-05 aapl 121
            4 110 50 2015-05-08 aaww 55
            5 60 100 2015-05-07 aaww 108
            6 100 60 2015-05-06 aaww 66
            7 40 120 2015-05-05 aaww 121





            share|improve this answer













            You want to use transform this will return a Series with the index aligned to the df so you can then add it as a new column:



            In [74]:

            df = pd.DataFrame('Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40],'Data3': [5, 8, 6, 1, 50, 100, 60, 120])

            df['Data4'] = df['Data3'].groupby(df['Date']).transform('sum')
            df
            Out[74]:
            Data2 Data3 Date Sym Data4
            0 11 5 2015-05-08 aapl 55
            1 8 8 2015-05-07 aapl 108
            2 10 6 2015-05-06 aapl 66
            3 15 1 2015-05-05 aapl 121
            4 110 50 2015-05-08 aaww 55
            5 60 100 2015-05-07 aaww 108
            6 100 60 2015-05-06 aaww 66
            7 40 120 2015-05-05 aaww 121






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered May 14 '15 at 18:46









            EdChumEdChum

            176k33376325




            176k33376325












            • What happens if we have a second groupby as in here: stackoverflow.com/a/40067099/281545

              – Mr_and_Mrs_D
              May 5 '18 at 20:40











            • @Mr_and_Mrs_D you'd have to reset the index and perform a left merge on the common columns in that case to add the column back

              – EdChum
              May 5 '18 at 20:56






            • 3





              Alternatively, one can use df.groupby('Date')['Data3'].transform('sum') (which I find slightly easier to remember).

              – Cleb
              Aug 24 '18 at 11:32

















            • What happens if we have a second groupby as in here: stackoverflow.com/a/40067099/281545

              – Mr_and_Mrs_D
              May 5 '18 at 20:40











            • @Mr_and_Mrs_D you'd have to reset the index and perform a left merge on the common columns in that case to add the column back

              – EdChum
              May 5 '18 at 20:56






            • 3





              Alternatively, one can use df.groupby('Date')['Data3'].transform('sum') (which I find slightly easier to remember).

              – Cleb
              Aug 24 '18 at 11:32
















            What happens if we have a second groupby as in here: stackoverflow.com/a/40067099/281545

            – Mr_and_Mrs_D
            May 5 '18 at 20:40





            What happens if we have a second groupby as in here: stackoverflow.com/a/40067099/281545

            – Mr_and_Mrs_D
            May 5 '18 at 20:40













            @Mr_and_Mrs_D you'd have to reset the index and perform a left merge on the common columns in that case to add the column back

            – EdChum
            May 5 '18 at 20:56





            @Mr_and_Mrs_D you'd have to reset the index and perform a left merge on the common columns in that case to add the column back

            – EdChum
            May 5 '18 at 20:56




            3




            3





            Alternatively, one can use df.groupby('Date')['Data3'].transform('sum') (which I find slightly easier to remember).

            – Cleb
            Aug 24 '18 at 11:32





            Alternatively, one can use df.groupby('Date')['Data3'].transform('sum') (which I find slightly easier to remember).

            – Cleb
            Aug 24 '18 at 11:32













            2














            I stumbled upon an interesting idiosyncrasy in the API. It seems like you consistently can shave off a few milliseconds of the time taken by transform if you instead use a direct function of GroupBy and broadcast it using map:



            df
            Date Sym Data2 Data3
            0 2015-05-08 aapl 11 5
            1 2015-05-07 aapl 8 8
            2 2015-05-06 aapl 10 6
            3 2015-05-05 aapl 15 1
            4 2015-05-08 aaww 110 50
            5 2015-05-07 aaww 60 100
            6 2015-05-06 aaww 100 60
            7 2015-05-05 aaww 40 120




            df.Date.map(df.groupby('Date')['Data3'].sum())

            0 55
            1 108
            2 66
            3 121
            4 55
            5 108
            6 66
            7 121
            Name: Date, dtype: int64


            Compare with



            df.groupby('Date')['Data3'].transform('sum')

            0 55
            1 108
            2 66
            3 121
            4 55
            5 108
            6 66
            7 121
            Name: Data3, dtype: int64



            My tests show that map is a bit faster if you can afford to use the direct GroupBy function (such as mean, min, max, first, etc). It is more or less faster for most general situations upto around ~200 thousand records. After that, the performance really depends on the data.



            enter image description here



            I would say this is a nice alternative to know, and is better if you have smaller frames with smaller numbers of groups, but I would recommend transform as a first choice. Thought this was worth sharing anyway.



            Benchmarking code, for reference:



            import perfplot

            perfplot.show(
            setup=lambda n: pd.DataFrame('A': np.random.choice(n//10, n), 'B': np.ones(n)),
            kernels=[
            lambda df: df.groupby('A')['B'].transform('sum'),
            lambda df: df.A.map(df.groupby('A')['B'].sum()),
            ],
            labels=['GroupBy.transform', 'GroupBy.sum + map'],
            n_range=[2**k for k in range(5, 20)],
            xlabel='N',
            logy=True,
            logx=True
            )





            share|improve this answer





























              2














              I stumbled upon an interesting idiosyncrasy in the API. It seems like you consistently can shave off a few milliseconds of the time taken by transform if you instead use a direct function of GroupBy and broadcast it using map:



              df
              Date Sym Data2 Data3
              0 2015-05-08 aapl 11 5
              1 2015-05-07 aapl 8 8
              2 2015-05-06 aapl 10 6
              3 2015-05-05 aapl 15 1
              4 2015-05-08 aaww 110 50
              5 2015-05-07 aaww 60 100
              6 2015-05-06 aaww 100 60
              7 2015-05-05 aaww 40 120




              df.Date.map(df.groupby('Date')['Data3'].sum())

              0 55
              1 108
              2 66
              3 121
              4 55
              5 108
              6 66
              7 121
              Name: Date, dtype: int64


              Compare with



              df.groupby('Date')['Data3'].transform('sum')

              0 55
              1 108
              2 66
              3 121
              4 55
              5 108
              6 66
              7 121
              Name: Data3, dtype: int64



              My tests show that map is a bit faster if you can afford to use the direct GroupBy function (such as mean, min, max, first, etc). It is more or less faster for most general situations upto around ~200 thousand records. After that, the performance really depends on the data.



              enter image description here



              I would say this is a nice alternative to know, and is better if you have smaller frames with smaller numbers of groups, but I would recommend transform as a first choice. Thought this was worth sharing anyway.



              Benchmarking code, for reference:



              import perfplot

              perfplot.show(
              setup=lambda n: pd.DataFrame('A': np.random.choice(n//10, n), 'B': np.ones(n)),
              kernels=[
              lambda df: df.groupby('A')['B'].transform('sum'),
              lambda df: df.A.map(df.groupby('A')['B'].sum()),
              ],
              labels=['GroupBy.transform', 'GroupBy.sum + map'],
              n_range=[2**k for k in range(5, 20)],
              xlabel='N',
              logy=True,
              logx=True
              )





              share|improve this answer



























                2












                2








                2







                I stumbled upon an interesting idiosyncrasy in the API. It seems like you consistently can shave off a few milliseconds of the time taken by transform if you instead use a direct function of GroupBy and broadcast it using map:



                df
                Date Sym Data2 Data3
                0 2015-05-08 aapl 11 5
                1 2015-05-07 aapl 8 8
                2 2015-05-06 aapl 10 6
                3 2015-05-05 aapl 15 1
                4 2015-05-08 aaww 110 50
                5 2015-05-07 aaww 60 100
                6 2015-05-06 aaww 100 60
                7 2015-05-05 aaww 40 120




                df.Date.map(df.groupby('Date')['Data3'].sum())

                0 55
                1 108
                2 66
                3 121
                4 55
                5 108
                6 66
                7 121
                Name: Date, dtype: int64


                Compare with



                df.groupby('Date')['Data3'].transform('sum')

                0 55
                1 108
                2 66
                3 121
                4 55
                5 108
                6 66
                7 121
                Name: Data3, dtype: int64



                My tests show that map is a bit faster if you can afford to use the direct GroupBy function (such as mean, min, max, first, etc). It is more or less faster for most general situations upto around ~200 thousand records. After that, the performance really depends on the data.



                enter image description here



                I would say this is a nice alternative to know, and is better if you have smaller frames with smaller numbers of groups, but I would recommend transform as a first choice. Thought this was worth sharing anyway.



                Benchmarking code, for reference:



                import perfplot

                perfplot.show(
                setup=lambda n: pd.DataFrame('A': np.random.choice(n//10, n), 'B': np.ones(n)),
                kernels=[
                lambda df: df.groupby('A')['B'].transform('sum'),
                lambda df: df.A.map(df.groupby('A')['B'].sum()),
                ],
                labels=['GroupBy.transform', 'GroupBy.sum + map'],
                n_range=[2**k for k in range(5, 20)],
                xlabel='N',
                logy=True,
                logx=True
                )





                share|improve this answer















                I stumbled upon an interesting idiosyncrasy in the API. It seems like you consistently can shave off a few milliseconds of the time taken by transform if you instead use a direct function of GroupBy and broadcast it using map:



                df
                Date Sym Data2 Data3
                0 2015-05-08 aapl 11 5
                1 2015-05-07 aapl 8 8
                2 2015-05-06 aapl 10 6
                3 2015-05-05 aapl 15 1
                4 2015-05-08 aaww 110 50
                5 2015-05-07 aaww 60 100
                6 2015-05-06 aaww 100 60
                7 2015-05-05 aaww 40 120




                df.Date.map(df.groupby('Date')['Data3'].sum())

                0 55
                1 108
                2 66
                3 121
                4 55
                5 108
                6 66
                7 121
                Name: Date, dtype: int64


                Compare with



                df.groupby('Date')['Data3'].transform('sum')

                0 55
                1 108
                2 66
                3 121
                4 55
                5 108
                6 66
                7 121
                Name: Data3, dtype: int64



                My tests show that map is a bit faster if you can afford to use the direct GroupBy function (such as mean, min, max, first, etc). It is more or less faster for most general situations upto around ~200 thousand records. After that, the performance really depends on the data.



                enter image description here



                I would say this is a nice alternative to know, and is better if you have smaller frames with smaller numbers of groups, but I would recommend transform as a first choice. Thought this was worth sharing anyway.



                Benchmarking code, for reference:



                import perfplot

                perfplot.show(
                setup=lambda n: pd.DataFrame('A': np.random.choice(n//10, n), 'B': np.ones(n)),
                kernels=[
                lambda df: df.groupby('A')['B'].transform('sum'),
                lambda df: df.A.map(df.groupby('A')['B'].sum()),
                ],
                labels=['GroupBy.transform', 'GroupBy.sum + map'],
                n_range=[2**k for k in range(5, 20)],
                xlabel='N',
                logy=True,
                logx=True
                )






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Jan 29 at 9:40

























                answered Jan 29 at 9:09









                coldspeedcoldspeed

                130k23135221




                130k23135221



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f30244952%2fpython-pandas-create-new-column-with-groupby-sum%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

                    ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

                    ⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌