Are list comprehensions syntactic sugar for `list(generator expression)` in Python 3?

Are list comprehensions syntactic sugar for `list(generator expression)` in Python 3?



In Python 3, is a list comprehension simply syntactic sugar for a generator expression fed into the list function?


list



e.g. is the following code:


squares = [x**2 for x in range(1000)]



actually converted in the background into the following?


squares = list(x**2 for x in range(1000))



I know the output is identical, and Python 3 fixes the surprising side-effects to surrounding namespaces that list comprehensions had, but in terms of what the CPython interpreter does under the hood, is the former converted to the latter, or are there any difference in how the code gets executed?



I found this claim of equivalence in the comments section to this question, and a quick google search showed the same claim being made here.



There was also some mention of this in the What's New in Python 3.0 docs, but the wording is somewhat vague:



Also note that list comprehensions have different semantics: they are closer to syntactic sugar for a generator expression inside a list() constructor, and in particular the loop control variables are no longer leaked into the surrounding scope.





I'm not sure that this assumption is correct. AFAIK a list comprehension is syntactic sugar for a for loop whereas a generator expression has much different semantics -- namely that it "generates" values iteratively. shrugs Maybe the semantics have changed in Python 3 :)
– James Mills
May 7 '15 at 9:01






NB: The exact wording is "closer to a generator expression in the list() constructor" -- Not that they are exactly that.
– James Mills
May 7 '15 at 9:04


list()





@JamesMills Thanks, right, precisely - 'closer to syntactic sugar' is the bit that has me somewhat baffled. I mean, how close exactly? Is it definitively not syntactic sugar?
– zehnpaard
May 7 '15 at 9:11





When in doubt, use the dis module to check.
– Karl Knechtel
May 7 '15 at 9:12


dis




4 Answers
4



Both work differently, the list comprehension version takes the advantage of special bytecode LIST_APPEND which calls PyList_Append directly for us. Hence it avoids attribute lookup to list.append and function call at Python level.


LIST_APPEND


list.append


>>> def func_lc():
[x**2 for x in y]
...
>>> dis.dis(func_lc)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>)
3 LOAD_CONST 2 ('func_lc.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 0 (y)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 POP_TOP
17 LOAD_CONST 0 (None)
20 RETURN_VALUE

>>> lc_object = list(dis.get_instructions(func_lc))[0].argval
>>> lc_object
<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>
>>> dis.dis(lc_object)
2 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 16 (to 25)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LOAD_CONST 0 (2)
18 BINARY_POWER
19 LIST_APPEND 2
22 JUMP_ABSOLUTE 6
>> 25 RETURN_VALUE



On the other hand the list() version simply passes the generator object to list's __init__ method which then calls its extend method internally. As the object is not a list or tuple CPython then gets its iterator first and then simply adds the items to the list until the iterator is not exhausted:


list()


__init__


extend


>>> def func_ge():
list(x**2 for x in y)
...
>>> dis.dis(func_ge)
2 0 LOAD_GLOBAL 0 (list)
3 LOAD_CONST 1 (<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>)
6 LOAD_CONST 2 ('func_ge.<locals>.<genexpr>')
9 MAKE_FUNCTION 0
12 LOAD_GLOBAL 1 (y)
15 GET_ITER
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> ge_object = list(dis.get_instructions(func_ge))[1].argval
>>> ge_object
<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>
>>> dis.dis(ge_object)
2 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 15 (to 21)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 LOAD_CONST 0 (2)
15 BINARY_POWER
16 YIELD_VALUE
17 POP_TOP
18 JUMP_ABSOLUTE 3
>> 21 LOAD_CONST 1 (None)
24 RETURN_VALUE
>>>



Timing comparisons:


>>> %timeit [x**2 for x in range(10**6)]
1 loops, best of 3: 453 ms per loop
>>> %timeit list(x**2 for x in range(10**6))
1 loops, best of 3: 478 ms per loop
>>> %%timeit
out =
for x in range(10**6):
out.append(x**2)
...
1 loops, best of 3: 510 ms per loop



Normal loops are slightly slow due to slow attribute lookup. Cache it and time again.


>>> %%timeit
out = ;append=out.append
for x in range(10**6):
append(x**2)
...
1 loops, best of 3: 467 ms per loop



Apart from the fact that list comprehension don't leak the variables anymore one more difference is that something like this is not valid anymore:


>>> [x**2 for x in 1, 2, 3] # Python 2
[1, 4, 9]
>>> [x**2 for x in 1, 2, 3] # Python 3
File "<ipython-input-69-bea9540dd1d6>", line 1
[x**2 for x in 1, 2, 3]
^
SyntaxError: invalid syntax

>>> [x**2 for x in (1, 2, 3)] # Add parenthesis
[1, 4, 9]
>>> for x in 1, 2, 3: # Python 3: For normal loops it still works
print(x**2)
...
1
4
9





Thanks for the detailed response! Interesting how timeit shows negligiable differences between the list comprehensions and the generator expressions being thrown into list, despite very different underlying bytecode (and C code).
– zehnpaard
May 8 '15 at 1:10


list





@zehnpaard Quoting from Guido's post: And before you start worrying about list comprehensions becoming slow in Python 3: thanks to the enormous implementation effort that went into Python 3 to speed things up in general, both list comprehensions and generator expressions in Python 3 are actually faster than they were in Python 2! (And there is no longer a speed difference between the two.)
– Ashwini Chaudhary
May 8 '15 at 12:16






How do you manage to run timeit with percentage signs directly from the python shell?
– Zaar Hai
May 1 at 2:42





@ZaarHai This is ipython shell, ran using --classic argument.
– Ashwini Chaudhary
May 1 at 2:52


--classic





@zehnpaard The difference is (no longer) negligible - the first version is about 30% faster, see also stackoverflow.com/q/52053579/5769463
– ead
Aug 29 at 6:03



Both forms create and call an anonymous function. However, the list(...) form creates a generator function and passes the returned generator-iterator to list, while with the [...] form, the anonymous function builds the list directly with LIST_APPEND opcodes.


list(...)


list


[...]


LIST_APPEND



The following code gets decompilation output of the anonymous functions for an example comprehension and its corresponding genexp-passed-to-list:


list


import dis

def f():
[x for x in ]

def g():
list(x for x in )

dis.dis(f.__code__.co_consts[1])
dis.dis(g.__code__.co_consts[1])



The output for the comprehension is


4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE



The output for the genexp is


7 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE





Note that the name list also needs to be looked up explicitly, since it may have been shadowed.
– Karl Knechtel
May 7 '15 at 9:15


list





Thanks for this, the dis module is always insightful but also occasionally a bit of a mystery - in your genexp example, it looks like the list creation is omitted completely if I read correctly, but I have no idea why...
– zehnpaard
May 8 '15 at 1:12


dis





@zehnpaard: That's not part of the anonymous function; the list call handles that.
– user2357112
May 8 '15 at 1:16


list





Ah fair enough, and g.__code__.co_consts[1] specifically points to the anonymous function?
– zehnpaard
May 8 '15 at 1:18


g.__code__.co_consts[1]





@zehnpaard: It points to the code object used to construct the anonymous function.
– user2357112
May 8 '15 at 1:22



You can actually show that the two can have different outcomes to prove they are inherently different:


>>> list(next(iter()) if x > 3 else x for x in range(10))
[0, 1, 2, 3]

>>> [next(iter()) if x > 3 else x for x in range(10)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
StopIteration



The expression inside the comprehension is not treated as a generator since the comprehension does not handle the StopIteration, whereas the list constructor does.


StopIteration


list





Note, in Python 3.7/3.8 the top one raises RuntimeError: generator raised StopIteration see python.org/dev/peps/pep-0479
– Chris_Rands
Aug 28 at 13:57



RuntimeError: generator raised StopIteration



They aren't the same, list() will evaluate what ever is given to it after what is in the parentheses has finished executing, not before.


list()



The in python is a bit magical, it tells python to wrap what ever is inside it as a list, more like a type hint for the language.







By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)