Python Generators
One of Python's strengths is its powerful generators. Even a basic understand of them unlocks the the ability to elegantly handle even huge datasets. But they hold some suprises in what is executed when -- understand them at a deeper level allows you to use and debug them more effectively.
Iterables and Iterators🔗
First, some basics. An iterable is an object like a list or a tuple
that can be iterated over in a for loop. We will see shortly that there are
other types of iterables.
a = [1, 2, 3]
for i in a:
print(i)
# 1
# 2
# 3
An iterator itr is an object with an __next()__ method which yields the
next object to be iterated over. If there are no more objects, next() will
instead raise a StopIteration exception. The pythonic way to do this is the
builtin next function: next(itr). There is a deep connection between
iterables and iterators; part of the iterable contract is that an iterable
must expose an __iter__() method which returns an iterator. The pythonic
way to call this is using the builtin iter() method: iter(a)
a = [1, 2]
itr = iter(a)
print(next(itr))
# 1
print(next(itr))
# 2
print(next(itr))
# StopIteration
In fact, the for loop over an iterable a can be implemented as:
# Vanilla for loop
for x in a:
f(x)
# or the iterator version
itr = iter(a)
while True:
try:
x = next(itr)
except StopIteration:
break
f(x)
To make life slightly easier, iterators must also be iterables; i.e. they
must expose the __iter__ method, which in this case will just return
themselves:
iter([1, 2, 3]) is [1, 2, 3]
# False
itr = iter([1, 2, 3])
iter(itr) is itr
# True
One important difference between iterators and most iterables is that
iterators are always one-shot, while iterables might be able to be consumed
more than once. If call next on an iterator (in a for loop or otherwise),
you are consuming one iteration, and it can't be consumed again. When you
iterate over most iterables (like lists, etc), you create a new iterator,
which starts from the beginning.
a = [1, 2]
b = iter([1, 2])
def print1(itbl):
for x in itbl:
print(x)
break
print1(a)
# 1
print1(a)
# 1
print1(a)
# 1
print1(b)
# 1
print1(b)
# 2
print1(b)
# Nothing! We have exhausted b already.
In fact, for many iterables, you can create multiple independent iterators:
a = [1, 2, 3]
b = iter(a)
c = iter(a)
next(b)
# 1
next(c)
# 1
next(c)
# 2
next(c)
# 3
next(c)
# StopIteration
next(b)
# 2
next(b)
# 3
next(b)
# StopIteration
To summarize, an iterable is something that exposes an iterator via iter, and
an iterator is something that you can call next on.
Generators🔗
If you use the yield keyword in a function, it does not return a normal value,
but instead returns a generator.
def f():
yield 1
g = f()
type(g)
# <type 'generator'>
Generators are in fact iterators.
print(next(g))
# 1
print(next(g))
# StopIteration
Each time you call the function, you'll get a new, independent, iterator.
def f():
yield 1
g1 = f()
g2 = f()
print(next(g1))
# 1
print(next(g1))
# StopIteration
print(next(g2))
# 1
print(next(g2))
# StopIteration
One of the most important features of a generator is that it lazily yields
values. Between next calls, it suspends execution, but maintains its
internal state. The classic example is the infinite counter, which lazily
produces an infinite sequence of integers.
def counter():
c = 0
while True:
yield c
c += 1
count = counter()
print(next(count))
# 0
print(next(count))
# 1
This will never raise a StopIteration; it will continue to count forever. Be
careful not to iterate it in a for loop without a break statement!
Let's examine the sequence of operations of this simple generator more closely.
def counter():
print('Starting counter')
c = 0
while True:
yield c
c += 1
print('Incremented counter to', c)
count = counter()
y = next(count)
# Starting counter
print('Counted', y)
# Counted 0
y = next(count)
# Incremented counter to 1
print('Counted', y)
# Counted 1
Let's examine the flow:
count = counter()initializes the generator, assigning tocountan iterator. Notice it does not execute anything inside of counter.y = next(count)iterates the iterator, executing the code incounterup to the first time we yield. It assigns the yielded value0toy.print 'Counted', yprints the yielded value0.y = next(count)executes the loop from the firstyieldto the second. It also assigns the yielded value1toy.print 'Counted', yprints the yielded value1.
Sending🔗
This is a basic generator that you can use to iterate/etc. But yield also
has a special power; it can receive values from the outside the function and
assign them to variables inside. Let's make a basic consumer:
def printer():
print('Starting printer')
while True:
x = yield
print('Printing', x)
p = printer()
y = next(p)
# Starting printer
print('Outside', y)
# Outside None
y = next(p)
# Printing None
print('Outside', y)
# Outside None
This creates a generator that doesn't yield anything. Let's follow the flow.
p = printer()creates p, which does not do any execution.y = next(p)executesprinterup to the linex = yield, which waits for iteration. Since we didn't specify anything to yield that's the equivalent ofyield None, andyis assigned toNone.print('Outside', y)SinceyisNone, we printOutside None.y = next(p)executesprinterfrom theyield. We assign the result of theyieldexpression tox. This is alsoNone, because we didn'tsendanything -- foreshadowing! Since we are yet again not yielding anything,Noneis returned and assigned toy.print('Outside', y)SinceyisNone, we printOutside None.
For yield to actually receive information, we need to use the send method
of the generator. In the above example, we didn't send anything to p, because
we wanted to emphases that printer is just a normal generator, but that
yield actually returns a value that can be assigned to a variable. Now let's
actually get to send:
p = printer()
y = p.send(None)
# Starting printer
print('Outside', y)
# Outside None
y = p.send('a')
# Printing a
print('Outside', y)
# Outside None
p.send(2)
# Printing 2
Now we're sending things to the generator to be processed. Let's follow the flow.
p = printer()creates p, which does not do any execution.y = p.send(None)starts the generators execution, exactly likenext(p)would. In fact, you could replace this withnext(p)and it would be the same. You need to "prime" the generator in this way; if you try to send a non-Nonevalue to a just-started generator, it will throw an error.print('Outside', y)prints the value ofy, which isNone, becausesendhas no return value.y = p.send('a')sends the value'a'to the generator. The linex = yieldmeans that it will assign toxthe value that is sent, in this case'a', and so that is printed before we loop around to theyieldstatement again.print('Outside', y)shows thatyis stillNone, and will always be so.p.send(2)sends the value2, which is printed as expected.
Now how do we tie these together? Here's a very simple way:
count = counter()
p = printer()
p.send(None)
for x in range(3)
p.send(next(count))
# > Starting counter
# > Printing 0
# > Incremented counter to 1
# > Printing 1
# > Incremented counter to 2
# > Printing 2
We say that count is a producer (because it is yielding results to the
outside), and p is a consumer, because we are sending values to it. Of
course, we can have a generator that consumes a producer.
def times2(gen):
print('Starting times2')
while True:
x = next(gen)
print('Multiplying %s by 2' % x)
yield 2*x
count = counter()
t2 = times2(count)
next(t2)
# > Starting times2
# > Starting counter
# > Multiplying 0 by 2
# 0
next(t2)
# > Incremented counter to 1
# > Multiplying 1 by 2
# 2
The while statement above is actually just a cumbersome for loop. It
could be written as
def times2(gen):
for x in gen:
yield 2*x