Why should someone use for initializing an empty object in R?

Why should someone use for initializing an empty object in R?



It seems that some programmers are using:


a =
a$foo = 1
a$bar = 2



What is the benefit over a = list(foo = 1, bar = 2)?


a = list(foo = 1, bar = 2)



Why should be used? This expression only returns NULL, so a NULL assignment would do the same, wouldn't it?



NULL


NULL





it's likely a carry over from usage in other programming language. that particular GH user is a polyglot and sometimes old habits die hard.
– hrbrmstr
Sep 1 at 22:31





I only found 6 instances on SO of <- using SymbolHound
– smci
Sep 2 at 17:47


<-




2 Answers
2



Why should be used, this expression only returns NULL, so an NULL assignment would do the same, wouldn't it?



NULL


NULL



Yes, a <- NULL gives the same effect. Using is likely to be a personal style.


a <- NULL



NULL



NULL is probably the most versatile and confusing R object. From R language definition of NULL:


NULL



It is used whenever there is a need to indicate or specify that an object is absent. It should not be confused with a vector or list of zero length.



The NULL object has no type and no modifiable properties. There is only one NULL object in R, to which all instances refer. To test for NULL use is.null. You cannot set attributes on NULL.


NULL


NULL


is.null


NULL



Strictly speaking, NULL is just NULL. And it is the only thing that is.null returns TRUE. However, according to ?NULL:


NULL


NULL


is.null


TRUE


?NULL



Objects with value NULL can be changed by replacement operators and will be coerced to the type of the right-hand side.



So, while it is not identical to a length-0 vector with a legitimate mode (not all modes in R are allowed in a vector; read ?mode for the full list of modes and ?vector for what are legitimate for a vector), this flexible coercion often makes it behave like a length-0 vector:


?mode


?vector


## examples of atomic mode
integer(0) ## vector(mode = "integer", length = 0)
numeric(0) ## vector(mode = "numeric", length = 0)
character(0) ## vector(mode = "character", length = 0)
logical(0) ## vector(mode = "logical", length = 0)

## "list" mode
list() ## vector(mode = "list", length = 0)

## "expression" mode
expression() ## vector(mode = "expression", length = 0)



You can do vector concatenation:


c(NULL, 0L) ## c(integer(0), 0L)
c(NULL, expression(1+2)) ## c(expression(), expression(1+2))
c(NULL, list(foo = 1)) ## c(list(), list(foo = 1))



You can grow a vector (as you did in your question):


a <- NULL; a[1] <- 1; a[2] <- 2
## a <- numeric(0); a[1] <- 1; a[2] <- 2

a <- NULL; a[1] <- TRUE; a[2] <- FALSE
## a <- logical(0); a[1] <- TRUE; a[2] <- FALSE

a <- NULL; a$foo <- 1; a$bar <- 2
## a <- list(); a$foo <- 1; a$bar <- 2

a <- NULL; a[1] <- expression(1+1); a[2] <- expression(2+2)
## a <- expression(); a[1] <- expression(1+1); a[2] <- expression(2+2)



Using to generate NULL is similar to expression(). Though not identical, the run-time coercion when you later do something with it really makes them indistinguishable. For example, when growing a list, any of the following would work:



NULL


expression()


a <- NULL; a$foo <- 1; a$bar <- 2
a <- numeric(0); a$foo <- 1; a$bar <- 2 ## there is a warning
a <- character(0); a$foo <- 1; a$bar <- 2 ## there is a warning
a <- expression(); a$foo <- 1; a$bar <- 2
a <- list(); a$foo <- 1; a$bar <- 2



For a length-0 vector with an atomic mode, a warning is produced during run-time coercion (because the change from "atomic" to "recursive" is too significant):


#Warning message:
#In a$foo <- 1 : Coercing LHS to a list



We don't get a warning for expression setup, because from ?expression:


?expression



As an object of mode ‘"expression"’ is a list ...



Well, it is not a "list" in the usual sense; it is an abstract syntax tree that is list-alike.



What is the benefit over a = list(foo = 1, bar = 2)?


a = list(foo = 1, bar = 2)



There is no advantage in doing so. You should have already read elsewhere that growing objects is a bad practice in R. A random search on Google gives: growing objects and loop memory pre-allocation.



If you know the length of the vector as well as the value of its each element, create it directly, like a = list(foo = 1, bar = 2).


a = list(foo = 1, bar = 2)



If you know the length of the vector but its elements' values are to be computed (say by a loop), set up a vector and do fill-in, like a <- vector("list", 2); a[[1]] <- 1; a[[2]] <- 2; names(a) <- c("foo", "bar").


a <- vector("list", 2); a[[1]] <- 1; a[[2]] <- 2; names(a) <- c("foo", "bar")



I actually looked up ?mode, but it doesn't list the possible modes. It points towards ?typeof which then points to the possible values listed in the structure TypeTable in src/main/util.c. I have not been able to find this file not even the folder (OSX). Any idea where to find this?


?mode


?typeof


src/main/util.c



It means the source of an R distribution, which is a ".tar.gz" file on CRAN. An alternative is look up on https://github.com/wch/r-source. Either way, this is the table:


TypeTable =
"NULL", NILSXP , /* real types */
"symbol", SYMSXP ,
"pairlist", LISTSXP ,
"closure", CLOSXP ,
"environment", ENVSXP ,
"promise", PROMSXP ,
"language", LANGSXP ,
"special", SPECIALSXP ,
"builtin", BUILTINSXP ,
"char", CHARSXP ,
"logical", LGLSXP ,
"integer", INTSXP ,
"double", REALSXP , /*- "real", for R <= 0.61.x */
"complex", CPLXSXP ,
"character", STRSXP ,
"...", DOTSXP ,
"any", ANYSXP ,
"expression", EXPRSXP ,
"list", VECSXP ,
"externalptr", EXTPTRSXP ,
"bytecode", BCODESXP ,
"weakref", WEAKREFSXP ,
"raw", RAWSXP ,
"S4", S4SXP ,
/* aliases : */
"numeric", REALSXP ,
"name", SYMSXP ,

(char *)NULL, -1
;



Per R's documentation on braces and parentheses (type ?'{' to read them), braces return the last expression evaluated within them.


?'{'



In this case, a <- essentially "returns" a null object, and is therefore equivalent to a <- NULL, which establishes an empty variable that can then be treated as a list.


a <-


a <- NULL



Incidentally, this is why it is possible to write R functions in which the function's output is returned simply by writing the name of the returned variable as the function's final statement. For example:


function(x)
y <- x * 2
return(y)



Is equivalent to:


function(x)
y <- x * 2
y



Or even:


function(x)
y <- x * 2



The final line of the function being an assignment suppresses printing of the result in the console, but the function definitely returns the expected value if it saved into a variable.





Thanks for the answer. I know that returns the last evaluated expression, that was not meant the be the question. The question was about, why would one prefer to write instead of NULL. No one would write a = 1 instead of a = 1 . Only because of saving three characters? I assume there are some drawbacks, like the parser is faster for a direct NULL assignment.
– petres
Sep 2 at 9:12




NULL


a = 1


a = 1


NULL



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Some of your past answers have not been well-received, and you're in danger of being blocked from answering.



Please pay close attention to the following guidance:



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌