How to reshape data from long to wide format?









up vote
199
down vote

favorite
73












I'm having trouble rearranging the following data frame:



set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)

dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357


I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:



 name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357


I've looked at melt and cast and a few other things, but none seem to do the job.










share|improve this question



















  • 2




    possible duplicate of Reshape three column data frame to matrix
    – Frank
    Oct 8 '13 at 20:53






  • 4




    @Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
    – smci
    Apr 11 '14 at 5:21






  • 3




    @smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
    – Frank
    Apr 11 '14 at 16:49






  • 2




    ^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
    – smci
    Apr 11 '14 at 21:03






  • 2




    @smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
    – Frank
    Apr 11 '14 at 21:13














up vote
199
down vote

favorite
73












I'm having trouble rearranging the following data frame:



set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)

dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357


I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:



 name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357


I've looked at melt and cast and a few other things, but none seem to do the job.










share|improve this question



















  • 2




    possible duplicate of Reshape three column data frame to matrix
    – Frank
    Oct 8 '13 at 20:53






  • 4




    @Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
    – smci
    Apr 11 '14 at 5:21






  • 3




    @smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
    – Frank
    Apr 11 '14 at 16:49






  • 2




    ^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
    – smci
    Apr 11 '14 at 21:03






  • 2




    @smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
    – Frank
    Apr 11 '14 at 21:13












up vote
199
down vote

favorite
73









up vote
199
down vote

favorite
73






73





I'm having trouble rearranging the following data frame:



set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)

dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357


I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:



 name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357


I've looked at melt and cast and a few other things, but none seem to do the job.










share|improve this question















I'm having trouble rearranging the following data frame:



set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)

dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357


I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:



 name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357


I've looked at melt and cast and a few other things, but none seem to do the job.







r reshape r-faq






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 22 '17 at 16:25









Taryn

188k45286351




188k45286351










asked May 4 '11 at 22:27









Steve

1,87972528




1,87972528







  • 2




    possible duplicate of Reshape three column data frame to matrix
    – Frank
    Oct 8 '13 at 20:53






  • 4




    @Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
    – smci
    Apr 11 '14 at 5:21






  • 3




    @smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
    – Frank
    Apr 11 '14 at 16:49






  • 2




    ^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
    – smci
    Apr 11 '14 at 21:03






  • 2




    @smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
    – Frank
    Apr 11 '14 at 21:13












  • 2




    possible duplicate of Reshape three column data frame to matrix
    – Frank
    Oct 8 '13 at 20:53






  • 4




    @Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
    – smci
    Apr 11 '14 at 5:21






  • 3




    @smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
    – Frank
    Apr 11 '14 at 16:49






  • 2




    ^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
    – smci
    Apr 11 '14 at 21:03






  • 2




    @smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
    – Frank
    Apr 11 '14 at 21:13







2




2




possible duplicate of Reshape three column data frame to matrix
– Frank
Oct 8 '13 at 20:53




possible duplicate of Reshape three column data frame to matrix
– Frank
Oct 8 '13 at 20:53




4




4




@Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
– smci
Apr 11 '14 at 5:21




@Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.
– smci
Apr 11 '14 at 5:21




3




3




@smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
– Frank
Apr 11 '14 at 16:49




@smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.
– Frank
Apr 11 '14 at 16:49




2




2




^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
– smci
Apr 11 '14 at 21:03




^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.
– smci
Apr 11 '14 at 21:03




2




2




@smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
– Frank
Apr 11 '14 at 21:13




@smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?
– Frank
Apr 11 '14 at 21:13












9 Answers
9






active

oldest

votes

















up vote
194
down vote



accepted










Using reshape function:



reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")





share|improve this answer
















  • 13




    +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
    – aL3xa
    May 5 '11 at 0:07







  • 106




    Good luck figuring out the arguments you need though
    – hadley
    May 5 '11 at 1:40







  • 2




    reshape is an outstanding example for a horrible function API. It is very close to useless.
    – NoBackingDown
    Oct 26 '17 at 15:18






  • 4




    The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
    – Brian D
    Nov 17 '17 at 17:11







  • 1




    I would say base R still wins vote-wise by a factor of about 2 to 1
    – vonjd
    Nov 22 at 15:14

















up vote
107
down vote













The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.



library(tidyr)
spread(dat1, key = numbers, value = value)


From github,




tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.



Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.







share|improve this answer


















  • 4




    Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
    – Jake
    Apr 12 '17 at 13:01

















up vote
65
down vote













You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is



library(reshape)
cast(dat1, name ~ numbers)


Or using reshape2



library(reshape2)
dcast(dat1, name ~ numbers)





share|improve this answer






















  • thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
    – Steve
    May 4 '11 at 22:45






  • 7




    +1 And use reshape2 for a performance gain.
    – Andrie
    May 6 '11 at 11:56






  • 1




    It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
    – thelatemail
    Jun 21 '17 at 22:37

















up vote
27
down vote













Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions



(Reference: Efficient reshaping using data.tables)



library(data.table)

setDT(dat1)
dcast(dat1, name ~ numbers, value.var = "value")

# name 1 2 3 4
# 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
# 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814


And, as of data.table v1.9.6 we can cast on multiple columns



## add an extra column
dat1[, value2 := value * 2]

## cast multiple value columns
dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

# name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
# 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
# 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627





share|improve this answer


















  • 3




    data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
    – joel.wilson
    Aug 31 '17 at 12:06

















up vote
23
down vote













Using your example dataframe, we could:



xtabs(value ~ name + numbers, data = dat1)





share|improve this answer


















  • 2




    this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
    – cloudscomputes
    Oct 20 '17 at 4:44

















up vote
14
down vote













Other two options:



Base package:



df <- unstack(dat1, form = value ~ numbers)
rownames(df) <- unique(dat1$name)
df


sqldf package:



library(sqldf)
sqldf('SELECT name,
MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
FROM dat1
GROUP BY name')





share|improve this answer



























    up vote
    9
    down vote













    Using base R aggregate function:



    aggregate(value ~ name, dat1, I)

    # name value.1 value.2 value.3 value.4
    #1 firstName 0.4145 -0.4747 0.0659 -0.5024
    #2 secondName -0.8259 0.1669 -0.8962 0.1681





    share|improve this answer





























      up vote
      4
      down vote













      There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:




      The whole system is based on two primitives or operators
      cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
      operators have pivot, un-pivot, one-hot encode, transpose, moving
      multiple rows and columns, and many other transforms as simple special
      cases.



      It is easy to write many different operations in terms of the
      cdata primitives. These operators can work-in memory or at big data
      scale (with databases and Apache Spark; for big data use the
      cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
      variants). The transforms are controlled by a control table that
      itself is a diagram of (or picture of) the transform.




      We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.



      library(cdata)
      # first build the control table
      pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
      columnToTakeKeysFrom = 'numbers', # this will become column headers
      columnToTakeValuesFrom = 'value', # this contains data
      sep="_") # optional for making column names

      # perform the move of data to columns
      dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
      keyColumns = c('name'), # this(these) column(s) should stay untouched
      controlTable = pivotControlTable# control table above
      )
      dat_wide

      #> name numbers_1 numbers_2 numbers_3 numbers_4
      #> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
      #> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357





      share|improve this answer



























        up vote
        2
        down vote













        The base reshape function works perfectly fine:



        df <- data.frame(
        year = c(rep(2000, 12), rep(2001, 12)),
        month = rep(1:12, 2),
        values = rnorm(24)
        )
        df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
        df_wide


        Where




        • idvar is the column of classes that separates rows


        • timevar is the column of classes to cast wide


        • v.names is the column containing numeric values


        • direction specifies wide or long format

        • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

        If no idvar exists, create one before using the reshape() function:



        df$id <- c(rep("year1", 12), rep("year2", 12))
        df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
        df_wide


        Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.






        share|improve this answer






















          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f5890584%2fhow-to-reshape-data-from-long-to-wide-format%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          9 Answers
          9






          active

          oldest

          votes








          9 Answers
          9






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          194
          down vote



          accepted










          Using reshape function:



          reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")





          share|improve this answer
















          • 13




            +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
            – aL3xa
            May 5 '11 at 0:07







          • 106




            Good luck figuring out the arguments you need though
            – hadley
            May 5 '11 at 1:40







          • 2




            reshape is an outstanding example for a horrible function API. It is very close to useless.
            – NoBackingDown
            Oct 26 '17 at 15:18






          • 4




            The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
            – Brian D
            Nov 17 '17 at 17:11







          • 1




            I would say base R still wins vote-wise by a factor of about 2 to 1
            – vonjd
            Nov 22 at 15:14














          up vote
          194
          down vote



          accepted










          Using reshape function:



          reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")





          share|improve this answer
















          • 13




            +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
            – aL3xa
            May 5 '11 at 0:07







          • 106




            Good luck figuring out the arguments you need though
            – hadley
            May 5 '11 at 1:40







          • 2




            reshape is an outstanding example for a horrible function API. It is very close to useless.
            – NoBackingDown
            Oct 26 '17 at 15:18






          • 4




            The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
            – Brian D
            Nov 17 '17 at 17:11







          • 1




            I would say base R still wins vote-wise by a factor of about 2 to 1
            – vonjd
            Nov 22 at 15:14












          up vote
          194
          down vote



          accepted







          up vote
          194
          down vote



          accepted






          Using reshape function:



          reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")





          share|improve this answer












          Using reshape function:



          reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered May 4 '11 at 23:20









          Chase

          48.3k12115148




          48.3k12115148







          • 13




            +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
            – aL3xa
            May 5 '11 at 0:07







          • 106




            Good luck figuring out the arguments you need though
            – hadley
            May 5 '11 at 1:40







          • 2




            reshape is an outstanding example for a horrible function API. It is very close to useless.
            – NoBackingDown
            Oct 26 '17 at 15:18






          • 4




            The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
            – Brian D
            Nov 17 '17 at 17:11







          • 1




            I would say base R still wins vote-wise by a factor of about 2 to 1
            – vonjd
            Nov 22 at 15:14












          • 13




            +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
            – aL3xa
            May 5 '11 at 0:07







          • 106




            Good luck figuring out the arguments you need though
            – hadley
            May 5 '11 at 1:40







          • 2




            reshape is an outstanding example for a horrible function API. It is very close to useless.
            – NoBackingDown
            Oct 26 '17 at 15:18






          • 4




            The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
            – Brian D
            Nov 17 '17 at 17:11







          • 1




            I would say base R still wins vote-wise by a factor of about 2 to 1
            – vonjd
            Nov 22 at 15:14







          13




          13




          +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
          – aL3xa
          May 5 '11 at 0:07





          +1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
          – aL3xa
          May 5 '11 at 0:07





          106




          106




          Good luck figuring out the arguments you need though
          – hadley
          May 5 '11 at 1:40





          Good luck figuring out the arguments you need though
          – hadley
          May 5 '11 at 1:40





          2




          2




          reshape is an outstanding example for a horrible function API. It is very close to useless.
          – NoBackingDown
          Oct 26 '17 at 15:18




          reshape is an outstanding example for a horrible function API. It is very close to useless.
          – NoBackingDown
          Oct 26 '17 at 15:18




          4




          4




          The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
          – Brian D
          Nov 17 '17 at 17:11





          The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
          – Brian D
          Nov 17 '17 at 17:11





          1




          1




          I would say base R still wins vote-wise by a factor of about 2 to 1
          – vonjd
          Nov 22 at 15:14




          I would say base R still wins vote-wise by a factor of about 2 to 1
          – vonjd
          Nov 22 at 15:14












          up vote
          107
          down vote













          The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.



          library(tidyr)
          spread(dat1, key = numbers, value = value)


          From github,




          tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.



          Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.







          share|improve this answer


















          • 4




            Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
            – Jake
            Apr 12 '17 at 13:01














          up vote
          107
          down vote













          The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.



          library(tidyr)
          spread(dat1, key = numbers, value = value)


          From github,




          tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.



          Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.







          share|improve this answer


















          • 4




            Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
            – Jake
            Apr 12 '17 at 13:01












          up vote
          107
          down vote










          up vote
          107
          down vote









          The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.



          library(tidyr)
          spread(dat1, key = numbers, value = value)


          From github,




          tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.



          Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.







          share|improve this answer














          The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.



          library(tidyr)
          spread(dat1, key = numbers, value = value)


          From github,




          tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.



          Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.








          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jul 29 '15 at 16:39

























          answered Jul 29 '14 at 19:37









          Gregor

          62.7k988165




          62.7k988165







          • 4




            Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
            – Jake
            Apr 12 '17 at 13:01












          • 4




            Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
            – Jake
            Apr 12 '17 at 13:01







          4




          4




          Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
          – Jake
          Apr 12 '17 at 13:01




          Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.
          – Jake
          Apr 12 '17 at 13:01










          up vote
          65
          down vote













          You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is



          library(reshape)
          cast(dat1, name ~ numbers)


          Or using reshape2



          library(reshape2)
          dcast(dat1, name ~ numbers)





          share|improve this answer






















          • thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
            – Steve
            May 4 '11 at 22:45






          • 7




            +1 And use reshape2 for a performance gain.
            – Andrie
            May 6 '11 at 11:56






          • 1




            It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
            – thelatemail
            Jun 21 '17 at 22:37














          up vote
          65
          down vote













          You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is



          library(reshape)
          cast(dat1, name ~ numbers)


          Or using reshape2



          library(reshape2)
          dcast(dat1, name ~ numbers)





          share|improve this answer






















          • thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
            – Steve
            May 4 '11 at 22:45






          • 7




            +1 And use reshape2 for a performance gain.
            – Andrie
            May 6 '11 at 11:56






          • 1




            It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
            – thelatemail
            Jun 21 '17 at 22:37












          up vote
          65
          down vote










          up vote
          65
          down vote









          You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is



          library(reshape)
          cast(dat1, name ~ numbers)


          Or using reshape2



          library(reshape2)
          dcast(dat1, name ~ numbers)





          share|improve this answer














          You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is



          library(reshape)
          cast(dat1, name ~ numbers)


          Or using reshape2



          library(reshape2)
          dcast(dat1, name ~ numbers)






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited May 26 '15 at 14:52









          David Arenburg

          77.6k1093158




          77.6k1093158










          answered May 4 '11 at 22:42









          Ista

          7,44212426




          7,44212426











          • thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
            – Steve
            May 4 '11 at 22:45






          • 7




            +1 And use reshape2 for a performance gain.
            – Andrie
            May 6 '11 at 11:56






          • 1




            It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
            – thelatemail
            Jun 21 '17 at 22:37
















          • thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
            – Steve
            May 4 '11 at 22:45






          • 7




            +1 And use reshape2 for a performance gain.
            – Andrie
            May 6 '11 at 11:56






          • 1




            It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
            – thelatemail
            Jun 21 '17 at 22:37















          thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
          – Steve
          May 4 '11 at 22:45




          thank you! I can't believe I didn't see that - I just looked at 'cast' before posting, but couldn't get it to work how I wanted.
          – Steve
          May 4 '11 at 22:45




          7




          7




          +1 And use reshape2 for a performance gain.
          – Andrie
          May 6 '11 at 11:56




          +1 And use reshape2 for a performance gain.
          – Andrie
          May 6 '11 at 11:56




          1




          1




          It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
          – thelatemail
          Jun 21 '17 at 22:37




          It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
          – thelatemail
          Jun 21 '17 at 22:37










          up vote
          27
          down vote













          Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions



          (Reference: Efficient reshaping using data.tables)



          library(data.table)

          setDT(dat1)
          dcast(dat1, name ~ numbers, value.var = "value")

          # name 1 2 3 4
          # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
          # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814


          And, as of data.table v1.9.6 we can cast on multiple columns



          ## add an extra column
          dat1[, value2 := value * 2]

          ## cast multiple value columns
          dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

          # name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
          # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
          # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627





          share|improve this answer


















          • 3




            data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
            – joel.wilson
            Aug 31 '17 at 12:06














          up vote
          27
          down vote













          Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions



          (Reference: Efficient reshaping using data.tables)



          library(data.table)

          setDT(dat1)
          dcast(dat1, name ~ numbers, value.var = "value")

          # name 1 2 3 4
          # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
          # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814


          And, as of data.table v1.9.6 we can cast on multiple columns



          ## add an extra column
          dat1[, value2 := value * 2]

          ## cast multiple value columns
          dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

          # name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
          # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
          # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627





          share|improve this answer


















          • 3




            data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
            – joel.wilson
            Aug 31 '17 at 12:06












          up vote
          27
          down vote










          up vote
          27
          down vote









          Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions



          (Reference: Efficient reshaping using data.tables)



          library(data.table)

          setDT(dat1)
          dcast(dat1, name ~ numbers, value.var = "value")

          # name 1 2 3 4
          # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
          # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814


          And, as of data.table v1.9.6 we can cast on multiple columns



          ## add an extra column
          dat1[, value2 := value * 2]

          ## cast multiple value columns
          dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

          # name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
          # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
          # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627





          share|improve this answer














          Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions



          (Reference: Efficient reshaping using data.tables)



          library(data.table)

          setDT(dat1)
          dcast(dat1, name ~ numbers, value.var = "value")

          # name 1 2 3 4
          # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
          # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814


          And, as of data.table v1.9.6 we can cast on multiple columns



          ## add an extra column
          dat1[, value2 := value * 2]

          ## cast multiple value columns
          dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

          # name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
          # 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
          # 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 27 '16 at 22:51

























          answered Mar 27 '16 at 22:35









          SymbolixAU

          16.1k32886




          16.1k32886







          • 3




            data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
            – joel.wilson
            Aug 31 '17 at 12:06












          • 3




            data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
            – joel.wilson
            Aug 31 '17 at 12:06







          3




          3




          data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
          – joel.wilson
          Aug 31 '17 at 12:06




          data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
          – joel.wilson
          Aug 31 '17 at 12:06










          up vote
          23
          down vote













          Using your example dataframe, we could:



          xtabs(value ~ name + numbers, data = dat1)





          share|improve this answer


















          • 2




            this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
            – cloudscomputes
            Oct 20 '17 at 4:44














          up vote
          23
          down vote













          Using your example dataframe, we could:



          xtabs(value ~ name + numbers, data = dat1)





          share|improve this answer


















          • 2




            this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
            – cloudscomputes
            Oct 20 '17 at 4:44












          up vote
          23
          down vote










          up vote
          23
          down vote









          Using your example dataframe, we could:



          xtabs(value ~ name + numbers, data = dat1)





          share|improve this answer














          Using your example dataframe, we could:



          xtabs(value ~ name + numbers, data = dat1)






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Sep 2 '16 at 7:37









          zx8754

          29.1k76396




          29.1k76396










          answered May 4 '11 at 22:58









          Jim M.

          4,51411430




          4,51411430







          • 2




            this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
            – cloudscomputes
            Oct 20 '17 at 4:44












          • 2




            this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
            – cloudscomputes
            Oct 20 '17 at 4:44







          2




          2




          this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
          – cloudscomputes
          Oct 20 '17 at 4:44




          this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
          – cloudscomputes
          Oct 20 '17 at 4:44










          up vote
          14
          down vote













          Other two options:



          Base package:



          df <- unstack(dat1, form = value ~ numbers)
          rownames(df) <- unique(dat1$name)
          df


          sqldf package:



          library(sqldf)
          sqldf('SELECT name,
          MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
          MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
          MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
          MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
          FROM dat1
          GROUP BY name')





          share|improve this answer
























            up vote
            14
            down vote













            Other two options:



            Base package:



            df <- unstack(dat1, form = value ~ numbers)
            rownames(df) <- unique(dat1$name)
            df


            sqldf package:



            library(sqldf)
            sqldf('SELECT name,
            MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
            MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
            MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
            MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
            FROM dat1
            GROUP BY name')





            share|improve this answer






















              up vote
              14
              down vote










              up vote
              14
              down vote









              Other two options:



              Base package:



              df <- unstack(dat1, form = value ~ numbers)
              rownames(df) <- unique(dat1$name)
              df


              sqldf package:



              library(sqldf)
              sqldf('SELECT name,
              MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
              MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
              MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
              MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
              FROM dat1
              GROUP BY name')





              share|improve this answer












              Other two options:



              Base package:



              df <- unstack(dat1, form = value ~ numbers)
              rownames(df) <- unique(dat1$name)
              df


              sqldf package:



              library(sqldf)
              sqldf('SELECT name,
              MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
              MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
              MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
              MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
              FROM dat1
              GROUP BY name')






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Jul 14 '15 at 17:44









              mpalanco

              7,16813239




              7,16813239




















                  up vote
                  9
                  down vote













                  Using base R aggregate function:



                  aggregate(value ~ name, dat1, I)

                  # name value.1 value.2 value.3 value.4
                  #1 firstName 0.4145 -0.4747 0.0659 -0.5024
                  #2 secondName -0.8259 0.1669 -0.8962 0.1681





                  share|improve this answer


























                    up vote
                    9
                    down vote













                    Using base R aggregate function:



                    aggregate(value ~ name, dat1, I)

                    # name value.1 value.2 value.3 value.4
                    #1 firstName 0.4145 -0.4747 0.0659 -0.5024
                    #2 secondName -0.8259 0.1669 -0.8962 0.1681





                    share|improve this answer
























                      up vote
                      9
                      down vote










                      up vote
                      9
                      down vote









                      Using base R aggregate function:



                      aggregate(value ~ name, dat1, I)

                      # name value.1 value.2 value.3 value.4
                      #1 firstName 0.4145 -0.4747 0.0659 -0.5024
                      #2 secondName -0.8259 0.1669 -0.8962 0.1681





                      share|improve this answer














                      Using base R aggregate function:



                      aggregate(value ~ name, dat1, I)

                      # name value.1 value.2 value.3 value.4
                      #1 firstName 0.4145 -0.4747 0.0659 -0.5024
                      #2 secondName -0.8259 0.1669 -0.8962 0.1681






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Dec 25 '17 at 4:05









                      Onyambu

                      15.4k1519




                      15.4k1519










                      answered Sep 2 '16 at 7:52









                      Ronak Shah

                      31.2k103753




                      31.2k103753




















                          up vote
                          4
                          down vote













                          There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:




                          The whole system is based on two primitives or operators
                          cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
                          operators have pivot, un-pivot, one-hot encode, transpose, moving
                          multiple rows and columns, and many other transforms as simple special
                          cases.



                          It is easy to write many different operations in terms of the
                          cdata primitives. These operators can work-in memory or at big data
                          scale (with databases and Apache Spark; for big data use the
                          cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
                          variants). The transforms are controlled by a control table that
                          itself is a diagram of (or picture of) the transform.




                          We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.



                          library(cdata)
                          # first build the control table
                          pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                          columnToTakeKeysFrom = 'numbers', # this will become column headers
                          columnToTakeValuesFrom = 'value', # this contains data
                          sep="_") # optional for making column names

                          # perform the move of data to columns
                          dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
                          keyColumns = c('name'), # this(these) column(s) should stay untouched
                          controlTable = pivotControlTable# control table above
                          )
                          dat_wide

                          #> name numbers_1 numbers_2 numbers_3 numbers_4
                          #> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
                          #> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357





                          share|improve this answer
























                            up vote
                            4
                            down vote













                            There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:




                            The whole system is based on two primitives or operators
                            cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
                            operators have pivot, un-pivot, one-hot encode, transpose, moving
                            multiple rows and columns, and many other transforms as simple special
                            cases.



                            It is easy to write many different operations in terms of the
                            cdata primitives. These operators can work-in memory or at big data
                            scale (with databases and Apache Spark; for big data use the
                            cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
                            variants). The transforms are controlled by a control table that
                            itself is a diagram of (or picture of) the transform.




                            We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.



                            library(cdata)
                            # first build the control table
                            pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                            columnToTakeKeysFrom = 'numbers', # this will become column headers
                            columnToTakeValuesFrom = 'value', # this contains data
                            sep="_") # optional for making column names

                            # perform the move of data to columns
                            dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
                            keyColumns = c('name'), # this(these) column(s) should stay untouched
                            controlTable = pivotControlTable# control table above
                            )
                            dat_wide

                            #> name numbers_1 numbers_2 numbers_3 numbers_4
                            #> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
                            #> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357





                            share|improve this answer






















                              up vote
                              4
                              down vote










                              up vote
                              4
                              down vote









                              There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:




                              The whole system is based on two primitives or operators
                              cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
                              operators have pivot, un-pivot, one-hot encode, transpose, moving
                              multiple rows and columns, and many other transforms as simple special
                              cases.



                              It is easy to write many different operations in terms of the
                              cdata primitives. These operators can work-in memory or at big data
                              scale (with databases and Apache Spark; for big data use the
                              cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
                              variants). The transforms are controlled by a control table that
                              itself is a diagram of (or picture of) the transform.




                              We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.



                              library(cdata)
                              # first build the control table
                              pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                              columnToTakeKeysFrom = 'numbers', # this will become column headers
                              columnToTakeValuesFrom = 'value', # this contains data
                              sep="_") # optional for making column names

                              # perform the move of data to columns
                              dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
                              keyColumns = c('name'), # this(these) column(s) should stay untouched
                              controlTable = pivotControlTable# control table above
                              )
                              dat_wide

                              #> name numbers_1 numbers_2 numbers_3 numbers_4
                              #> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
                              #> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357





                              share|improve this answer












                              There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:




                              The whole system is based on two primitives or operators
                              cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
                              operators have pivot, un-pivot, one-hot encode, transpose, moving
                              multiple rows and columns, and many other transforms as simple special
                              cases.



                              It is easy to write many different operations in terms of the
                              cdata primitives. These operators can work-in memory or at big data
                              scale (with databases and Apache Spark; for big data use the
                              cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
                              variants). The transforms are controlled by a control table that
                              itself is a diagram of (or picture of) the transform.




                              We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.



                              library(cdata)
                              # first build the control table
                              pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                              columnToTakeKeysFrom = 'numbers', # this will become column headers
                              columnToTakeValuesFrom = 'value', # this contains data
                              sep="_") # optional for making column names

                              # perform the move of data to columns
                              dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
                              keyColumns = c('name'), # this(these) column(s) should stay untouched
                              controlTable = pivotControlTable# control table above
                              )
                              dat_wide

                              #> name numbers_1 numbers_2 numbers_3 numbers_4
                              #> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
                              #> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Dec 23 '17 at 23:01









                              dmi3kno

                              1,775521




                              1,775521




















                                  up vote
                                  2
                                  down vote













                                  The base reshape function works perfectly fine:



                                  df <- data.frame(
                                  year = c(rep(2000, 12), rep(2001, 12)),
                                  month = rep(1:12, 2),
                                  values = rnorm(24)
                                  )
                                  df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
                                  df_wide


                                  Where




                                  • idvar is the column of classes that separates rows


                                  • timevar is the column of classes to cast wide


                                  • v.names is the column containing numeric values


                                  • direction specifies wide or long format

                                  • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

                                  If no idvar exists, create one before using the reshape() function:



                                  df$id <- c(rep("year1", 12), rep("year2", 12))
                                  df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
                                  df_wide


                                  Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.






                                  share|improve this answer


























                                    up vote
                                    2
                                    down vote













                                    The base reshape function works perfectly fine:



                                    df <- data.frame(
                                    year = c(rep(2000, 12), rep(2001, 12)),
                                    month = rep(1:12, 2),
                                    values = rnorm(24)
                                    )
                                    df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
                                    df_wide


                                    Where




                                    • idvar is the column of classes that separates rows


                                    • timevar is the column of classes to cast wide


                                    • v.names is the column containing numeric values


                                    • direction specifies wide or long format

                                    • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

                                    If no idvar exists, create one before using the reshape() function:



                                    df$id <- c(rep("year1", 12), rep("year2", 12))
                                    df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
                                    df_wide


                                    Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.






                                    share|improve this answer
























                                      up vote
                                      2
                                      down vote










                                      up vote
                                      2
                                      down vote









                                      The base reshape function works perfectly fine:



                                      df <- data.frame(
                                      year = c(rep(2000, 12), rep(2001, 12)),
                                      month = rep(1:12, 2),
                                      values = rnorm(24)
                                      )
                                      df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
                                      df_wide


                                      Where




                                      • idvar is the column of classes that separates rows


                                      • timevar is the column of classes to cast wide


                                      • v.names is the column containing numeric values


                                      • direction specifies wide or long format

                                      • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

                                      If no idvar exists, create one before using the reshape() function:



                                      df$id <- c(rep("year1", 12), rep("year2", 12))
                                      df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
                                      df_wide


                                      Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.






                                      share|improve this answer














                                      The base reshape function works perfectly fine:



                                      df <- data.frame(
                                      year = c(rep(2000, 12), rep(2001, 12)),
                                      month = rep(1:12, 2),
                                      values = rnorm(24)
                                      )
                                      df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
                                      df_wide


                                      Where




                                      • idvar is the column of classes that separates rows


                                      • timevar is the column of classes to cast wide


                                      • v.names is the column containing numeric values


                                      • direction specifies wide or long format

                                      • the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

                                      If no idvar exists, create one before using the reshape() function:



                                      df$id <- c(rep("year1", 12), rep("year2", 12))
                                      df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
                                      df_wide


                                      Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.







                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited Aug 29 at 3:00









                                      SymbolixAU

                                      16.1k32886




                                      16.1k32886










                                      answered Aug 2 at 23:50









                                      Adam Erickson

                                      1,7141320




                                      1,7141320



























                                          draft saved

                                          draft discarded
















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid


                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.

                                          To learn more, see our tips on writing great answers.





                                          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                          Please pay close attention to the following guidance:


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid


                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.

                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f5890584%2fhow-to-reshape-data-from-long-to-wide-format%23new-answer', 'question_page');

                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

                                          Edmonton

                                          Crossroads (UK TV series)