Are list comprehensions syntactic sugar for `list(generator expression)` in Python 3?
Are list comprehensions syntactic sugar for `list(generator expression)` in Python 3?
In Python 3, is a list comprehension simply syntactic sugar for a generator expression fed into the list
function?
list
e.g. is the following code:
squares = [x**2 for x in range(1000)]
actually converted in the background into the following?
squares = list(x**2 for x in range(1000))
I know the output is identical, and Python 3 fixes the surprising side-effects to surrounding namespaces that list comprehensions had, but in terms of what the CPython interpreter does under the hood, is the former converted to the latter, or are there any difference in how the code gets executed?
I found this claim of equivalence in the comments section to this question, and a quick google search showed the same claim being made here.
There was also some mention of this in the What's New in Python 3.0 docs, but the wording is somewhat vague:
Also note that list comprehensions have different semantics: they are closer to syntactic sugar for a generator expression inside a list() constructor, and in particular the loop control variables are no longer leaked into the surrounding scope.
NB: The exact wording is "closer to a generator expression in the
list()
constructor" -- Not that they are exactly that.– James Mills
May 7 '15 at 9:04
list()
@JamesMills Thanks, right, precisely - 'closer to syntactic sugar' is the bit that has me somewhat baffled. I mean, how close exactly? Is it definitively not syntactic sugar?
– zehnpaard
May 7 '15 at 9:11
When in doubt, use the
dis
module to check.– Karl Knechtel
May 7 '15 at 9:12
dis
4 Answers
4
Both work differently, the list comprehension version takes the advantage of special bytecode LIST_APPEND
which calls PyList_Append directly for us. Hence it avoids attribute lookup to list.append
and function call at Python level.
LIST_APPEND
list.append
>>> def func_lc():
[x**2 for x in y]
...
>>> dis.dis(func_lc)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>)
3 LOAD_CONST 2 ('func_lc.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 0 (y)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 POP_TOP
17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>> lc_object = list(dis.get_instructions(func_lc))[0].argval
>>> lc_object
<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>
>>> dis.dis(lc_object)
2 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 16 (to 25)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LOAD_CONST 0 (2)
18 BINARY_POWER
19 LIST_APPEND 2
22 JUMP_ABSOLUTE 6
>> 25 RETURN_VALUE
On the other hand the list()
version simply passes the generator object to list's __init__
method which then calls its extend
method internally. As the object is not a list or tuple CPython then gets its iterator first and then simply adds the items to the list until the iterator is not exhausted:
list()
__init__
extend
>>> def func_ge():
list(x**2 for x in y)
...
>>> dis.dis(func_ge)
2 0 LOAD_GLOBAL 0 (list)
3 LOAD_CONST 1 (<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>)
6 LOAD_CONST 2 ('func_ge.<locals>.<genexpr>')
9 MAKE_FUNCTION 0
12 LOAD_GLOBAL 1 (y)
15 GET_ITER
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> ge_object = list(dis.get_instructions(func_ge))[1].argval
>>> ge_object
<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>
>>> dis.dis(ge_object)
2 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 15 (to 21)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 LOAD_CONST 0 (2)
15 BINARY_POWER
16 YIELD_VALUE
17 POP_TOP
18 JUMP_ABSOLUTE 3
>> 21 LOAD_CONST 1 (None)
24 RETURN_VALUE
>>>
Timing comparisons:
>>> %timeit [x**2 for x in range(10**6)]
1 loops, best of 3: 453 ms per loop
>>> %timeit list(x**2 for x in range(10**6))
1 loops, best of 3: 478 ms per loop
>>> %%timeit
out =
for x in range(10**6):
out.append(x**2)
...
1 loops, best of 3: 510 ms per loop
Normal loops are slightly slow due to slow attribute lookup. Cache it and time again.
>>> %%timeit
out = ;append=out.append
for x in range(10**6):
append(x**2)
...
1 loops, best of 3: 467 ms per loop
Apart from the fact that list comprehension don't leak the variables anymore one more difference is that something like this is not valid anymore:
>>> [x**2 for x in 1, 2, 3] # Python 2
[1, 4, 9]
>>> [x**2 for x in 1, 2, 3] # Python 3
File "<ipython-input-69-bea9540dd1d6>", line 1
[x**2 for x in 1, 2, 3]
^
SyntaxError: invalid syntax
>>> [x**2 for x in (1, 2, 3)] # Add parenthesis
[1, 4, 9]
>>> for x in 1, 2, 3: # Python 3: For normal loops it still works
print(x**2)
...
1
4
9
Thanks for the detailed response! Interesting how timeit shows negligiable differences between the list comprehensions and the generator expressions being thrown into
list
, despite very different underlying bytecode (and C code).– zehnpaard
May 8 '15 at 1:10
list
@zehnpaard Quoting from Guido's post: And before you start worrying about list comprehensions becoming slow in Python 3: thanks to the enormous implementation effort that went into Python 3 to speed things up in general, both list comprehensions and generator expressions in Python 3 are actually faster than they were in Python 2! (And there is no longer a speed difference between the two.)
– Ashwini Chaudhary
May 8 '15 at 12:16
How do you manage to run timeit with percentage signs directly from the python shell?
– Zaar Hai
May 1 at 2:42
@ZaarHai This is ipython shell, ran using
--classic
argument.– Ashwini Chaudhary
May 1 at 2:52
--classic
@zehnpaard The difference is (no longer) negligible - the first version is about 30% faster, see also stackoverflow.com/q/52053579/5769463
– ead
Aug 29 at 6:03
Both forms create and call an anonymous function. However, the list(...)
form creates a generator function and passes the returned generator-iterator to list
, while with the [...]
form, the anonymous function builds the list directly with LIST_APPEND
opcodes.
list(...)
list
[...]
LIST_APPEND
The following code gets decompilation output of the anonymous functions for an example comprehension and its corresponding genexp-passed-to-list
:
list
import dis
def f():
[x for x in ]
def g():
list(x for x in )
dis.dis(f.__code__.co_consts[1])
dis.dis(g.__code__.co_consts[1])
The output for the comprehension is
4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
The output for the genexp is
7 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
Note that the name
list
also needs to be looked up explicitly, since it may have been shadowed.– Karl Knechtel
May 7 '15 at 9:15
list
Thanks for this, the
dis
module is always insightful but also occasionally a bit of a mystery - in your genexp example, it looks like the list creation is omitted completely if I read correctly, but I have no idea why...– zehnpaard
May 8 '15 at 1:12
dis
@zehnpaard: That's not part of the anonymous function; the
list
call handles that.– user2357112
May 8 '15 at 1:16
list
Ah fair enough, and
g.__code__.co_consts[1]
specifically points to the anonymous function?– zehnpaard
May 8 '15 at 1:18
g.__code__.co_consts[1]
@zehnpaard: It points to the code object used to construct the anonymous function.
– user2357112
May 8 '15 at 1:22
You can actually show that the two can have different outcomes to prove they are inherently different:
>>> list(next(iter()) if x > 3 else x for x in range(10))
[0, 1, 2, 3]
>>> [next(iter()) if x > 3 else x for x in range(10)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
StopIteration
The expression inside the comprehension is not treated as a generator since the comprehension does not handle the StopIteration
, whereas the list
constructor does.
StopIteration
list
Note, in Python 3.7/3.8 the top one raises
RuntimeError: generator raised StopIteration
see python.org/dev/peps/pep-0479– Chris_Rands
Aug 28 at 13:57
RuntimeError: generator raised StopIteration
They aren't the same, list()
will evaluate what ever is given to it after what is in the parentheses has finished executing, not before.
list()
The in python is a bit magical, it tells python to wrap what ever is inside it as a list, more like a type hint for the language.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I'm not sure that this assumption is correct. AFAIK a list comprehension is syntactic sugar for a for loop whereas a generator expression has much different semantics -- namely that it "generates" values iteratively. shrugs Maybe the semantics have changed in Python 3 :)
– James Mills
May 7 '15 at 9:01