Are list comprehensions syntactic sugar for `list(generator expression)` in Python 3?

In Python 3, is a list comprehension simply syntactic sugar for a generator expression fed into the list function?

list

e.g. is the following code:

squares = [x**2 for x in range(1000)]

actually converted in the background into the following?

squares = list(x**2 for x in range(1000))

I know the output is identical, and Python 3 fixes the surprising side-effects to surrounding namespaces that list comprehensions had, but in terms of what the CPython interpreter does under the hood, is the former converted to the latter, or are there any difference in how the code gets executed?

I found this claim of equivalence in the comments section to this question, and a quick google search showed the same claim being made here.

There was also some mention of this in the What's New in Python 3.0 docs, but the wording is somewhat vague:

Also note that list comprehensions have different semantics: they are closer to syntactic sugar for a generator expression inside a list() constructor, and in particular the loop control variables are no longer leaked into the surrounding scope.

I'm not sure that this assumption is correct. AFAIK a list comprehension is syntactic sugar for a for loop whereas a generator expression has much different semantics -- namely that it "generates" values iteratively. shrugs Maybe the semantics have changed in Python 3 :)
– James Mills
May 7 '15 at 9:01

NB: The exact wording is "closer to a generator expression in the list() constructor" -- Not that they are exactly that.
– James Mills
May 7 '15 at 9:04

list()

@JamesMills Thanks, right, precisely - 'closer to syntactic sugar' is the bit that has me somewhat baffled. I mean, how close exactly? Is it definitively not syntactic sugar?
– zehnpaard
May 7 '15 at 9:11

When in doubt, use the dis module to check.
– Karl Knechtel
May 7 '15 at 9:12

dis

4 Answers
4

Both work differently, the list comprehension version takes the advantage of special bytecode LIST_APPEND which calls PyList_Append directly for us. Hence it avoids attribute lookup to list.append and function call at Python level.

LIST_APPEND

list.append

>>> def func_lc(): [x**2 for x in y] ... >>> dis.dis(func_lc) 2 0 LOAD_CONST 1 (<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>) 3 LOAD_CONST 2 ('func_lc.<locals>.<listcomp>') 6 MAKE_FUNCTION 0 9 LOAD_GLOBAL 0 (y) 12 GET_ITER 13 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 16 POP_TOP 17 LOAD_CONST 0 (None) 20 RETURN_VALUE >>> lc_object = list(dis.get_instructions(func_lc))[0].argval >>> lc_object <code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2> >>> dis.dis(lc_object) 2 0 BUILD_LIST 0 3 LOAD_FAST 0 (.0) >> 6 FOR_ITER 16 (to 25) 9 STORE_FAST 1 (x) 12 LOAD_FAST 1 (x) 15 LOAD_CONST 0 (2) 18 BINARY_POWER 19 LIST_APPEND 2 22 JUMP_ABSOLUTE 6 >> 25 RETURN_VALUE

On the other hand the list() version simply passes the generator object to list's __init__ method which then calls its extend method internally. As the object is not a list or tuple CPython then gets its iterator first and then simply adds the items to the list until the iterator is not exhausted:

list()

__init__

extend

>>> def func_ge(): list(x**2 for x in y) ... >>> dis.dis(func_ge) 2 0 LOAD_GLOBAL 0 (list) 3 LOAD_CONST 1 (<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>) 6 LOAD_CONST 2 ('func_ge.<locals>.<genexpr>') 9 MAKE_FUNCTION 0 12 LOAD_GLOBAL 1 (y) 15 GET_ITER 16 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 19 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 22 POP_TOP 23 LOAD_CONST 0 (None) 26 RETURN_VALUE >>> ge_object = list(dis.get_instructions(func_ge))[1].argval >>> ge_object <code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2> >>> dis.dis(ge_object) 2 0 LOAD_FAST 0 (.0) >> 3 FOR_ITER 15 (to 21) 6 STORE_FAST 1 (x) 9 LOAD_FAST 1 (x) 12 LOAD_CONST 0 (2) 15 BINARY_POWER 16 YIELD_VALUE 17 POP_TOP 18 JUMP_ABSOLUTE 3 >> 21 LOAD_CONST 1 (None) 24 RETURN_VALUE >>>

Timing comparisons:

>>> %timeit [x**2 for x in range(10**6)] 1 loops, best of 3: 453 ms per loop >>> %timeit list(x**2 for x in range(10**6)) 1 loops, best of 3: 478 ms per loop >>> %%timeit out = for x in range(10**6): out.append(x**2) ... 1 loops, best of 3: 510 ms per loop

Normal loops are slightly slow due to slow attribute lookup. Cache it and time again.

>>> %%timeit out = ;append=out.append for x in range(10**6): append(x**2) ... 1 loops, best of 3: 467 ms per loop

Apart from the fact that list comprehension don't leak the variables anymore one more difference is that something like this is not valid anymore:

>>> [x**2 for x in 1, 2, 3] # Python 2 [1, 4, 9] >>> [x**2 for x in 1, 2, 3] # Python 3 File "<ipython-input-69-bea9540dd1d6>", line 1 [x**2 for x in 1, 2, 3] ^ SyntaxError: invalid syntax >>> [x**2 for x in (1, 2, 3)] # Add parenthesis [1, 4, 9] >>> for x in 1, 2, 3: # Python 3: For normal loops it still works print(x**2) ... 1 4 9

Thanks for the detailed response! Interesting how timeit shows negligiable differences between the list comprehensions and the generator expressions being thrown into list, despite very different underlying bytecode (and C code).
– zehnpaard
May 8 '15 at 1:10

list

@zehnpaard Quoting from Guido's post: And before you start worrying about list comprehensions becoming slow in Python 3: thanks to the enormous implementation effort that went into Python 3 to speed things up in general, both list comprehensions and generator expressions in Python 3 are actually faster than they were in Python 2! (And there is no longer a speed difference between the two.)
– Ashwini Chaudhary
May 8 '15 at 12:16

How do you manage to run timeit with percentage signs directly from the python shell?
– Zaar Hai
May 1 at 2:42

@ZaarHai This is ipython shell, ran using --classic argument.
– Ashwini Chaudhary
May 1 at 2:52

--classic

@zehnpaard The difference is (no longer) negligible - the first version is about 30% faster, see also stackoverflow.com/q/52053579/5769463
– ead
Aug 29 at 6:03

Both forms create and call an anonymous function. However, the list(...) form creates a generator function and passes the returned generator-iterator to list, while with the [...] form, the anonymous function builds the list directly with LIST_APPEND opcodes.

list(...)

list

[...]

LIST_APPEND

The following code gets decompilation output of the anonymous functions for an example comprehension and its corresponding genexp-passed-to-list:

list

import dis def f(): [x for x in ] def g(): list(x for x in ) dis.dis(f.__code__.co_consts[1]) dis.dis(g.__code__.co_consts[1])

The output for the comprehension is

4 0 BUILD_LIST 0 3 LOAD_FAST 0 (.0) >> 6 FOR_ITER 12 (to 21) 9 STORE_FAST 1 (x) 12 LOAD_FAST 1 (x) 15 LIST_APPEND 2 18 JUMP_ABSOLUTE 6 >> 21 RETURN_VALUE

The output for the genexp is

7 0 LOAD_FAST 0 (.0) >> 3 FOR_ITER 11 (to 17) 6 STORE_FAST 1 (x) 9 LOAD_FAST 1 (x) 12 YIELD_VALUE 13 POP_TOP 14 JUMP_ABSOLUTE 3 >> 17 LOAD_CONST 0 (None) 20 RETURN_VALUE

Note that the name list also needs to be looked up explicitly, since it may have been shadowed.
– Karl Knechtel
May 7 '15 at 9:15

list

Thanks for this, the dis module is always insightful but also occasionally a bit of a mystery - in your genexp example, it looks like the list creation is omitted completely if I read correctly, but I have no idea why...
– zehnpaard
May 8 '15 at 1:12

dis

@zehnpaard: That's not part of the anonymous function; the list call handles that.
– user2357112
May 8 '15 at 1:16

list

Ah fair enough, and g.__code__.co_consts[1] specifically points to the anonymous function?
– zehnpaard
May 8 '15 at 1:18

g.__code__.co_consts[1]

@zehnpaard: It points to the code object used to construct the anonymous function.
– user2357112
May 8 '15 at 1:22

You can actually show that the two can have different outcomes to prove they are inherently different:

>>> list(next(iter()) if x > 3 else x for x in range(10)) [0, 1, 2, 3] >>> [next(iter()) if x > 3 else x for x in range(10)] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> StopIteration

The expression inside the comprehension is not treated as a generator since the comprehension does not handle the StopIteration, whereas the list constructor does.

StopIteration

list

Note, in Python 3.7/3.8 the top one raises RuntimeError: generator raised StopIteration see python.org/dev/peps/pep-0479
– Chris_Rands
Aug 28 at 13:57

RuntimeError: generator raised StopIteration

They aren't the same, list() will evaluate what ever is given to it after what is in the parentheses has finished executing, not before.

list()

The in python is a bit magical, it tells python to wrap what ever is inside it as a list, more like a type hint for the language.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt