Recently, I've been TA-ing on an introduction to functional programming course, using Haskell. This is a very enlightening experience, as it once again proves the notion that the best way to learn something is to teach it.
Since Haskell is rife with topics that I feel I understand but are still tricky to actually explain1, trying to answer student question can sometimes prove more challenging than expected. This post is my attempt to answer one such question that I got the other day.
Disclaimer: this post is aimed at Haskell beginners or teachers.
As part of their homework, the students had to reimplement some basic list functions on their own. Part of the requirements were that when possible a function accepting a list should seamlessly work for both finite and infinite lists. Given that Haskell is lazy by default, this tends to happen magically on its own, without any extra effort. But trying to understand when this works or not is another matter.
Here I'll try to add some intuition for the following question:
When is a function that operates on lists is "lazy enough" to handle infinite lists?
The answer I'm going to propose is that2:
A function on lists is "lazy enough6" when for all inputs it uses a finite chunk of the input to produce a finite chunk of the output.
The answer might be even more confusing, so let's check out some examples to get intuition.
The head
function3:
head (x : _) = x
head [] = error "head on an empty list"
head
takes in a list, a potentially infinite list, but it only looks at the head of the list (a finite chunk) and ignores the tail (a potentially infinite chunk of data). So head
is "lazy enough" to handle infinite lists:
head [1..] == 1
What about tail
?
tail (_ : xs) = xs
tail [] = error "tail on an empty list"
To produce a finite chunk of the tail, we only need to skip the head (a finite chunk) and then produce the (finite) chunk of the tail of the list. So this works:
head $ tail [1..] == 2
Similarly functions like take n
, only consume a finite chunk of elements (n
) to produce a finite chunk of elements (a list of n
elements).
What are examples that are not lazy enough?
Take length
for example:
length [] = 0
length (_ : xs) = 1 + length xs
Here, to produce a finite chunk of the result (a single number), we must consume the whole list (a potentially infinite chunk of data). So there's no way this can work for infinite lists.
length [1..] -- never terminates
The same goes for reverse
: the first element of the result of reverse
is the last element of the input list. To produce a finite chunk of the result (the last element of the input), we must consume the whole, potentially infinite, list. And so reverse
does not work for infinite lists.
An important thing to note is that different implementations of the same function can behave differently with respect to laziness depending on the implementation details. Consider the following implementation of head
:
badHead xs =
if length xs > 0 then head' xs
else error "head on empty list"
where
head' (x : _) = x
This implementation tries to be "smart" about checking the error condition. But now in order to produce a finite chunk (the head of the list) we must consume the whole (potentially infinite) list to check its length. The new implementation of head
no longer works for infinite lists:
badHead [1..] -- never terminates
Now let's take a look at something more complicated. Figuring out when the functions above are lazy enough or not is probably not that difficult. Let's analyze a more challenging function: partialSums
.
Given a list of numbers, partialSums
creates a new list of sums of the numbers up to the current element:
partialSums [1, 2, 3, 4, 5] == [0, 1, 3, 6, 10, 15]
And it should also work for infinite lists:
take 6 $ partialSums [1..] == [0, 1, 3, 6, 10, 15]
We can implement this function using explicit recursion:
partialSums [] = [0]
partialSums (x : xs) = 0 : map (x +) (partialSums xs)
Why does this work for infinite lists? Let's say we want the first element of partialSums [1..]
. We can follow the equations in this definition4:
partialSums [1..] == -- pattern matching
partialSums (1 : [2..]) == -- by definition
0 : map (1 +) (partialSums [2..])
Assuming that (:)
is lazy (it is), we can access the first element 0
without accessing the rest of the (infinite) result.
To be able to continue evaluation, recall the definition of map
:
map _ [] = []
map f (x : xs) = f x : map f xs
map
itself is lazy, to get the first element of its result, we only need to evaluate f
on the head of the result. No need to go over the possibly infinite tail.
Let's continue evaluation and grab the next element:
0 : map (1 +) (partialSums [2..]) == -- evaluating the first step of `partialSums` so that we can pattern match on the result
0 : map (1 +) (0 : (map (2 +) (partialSums [3..]))) == -- by definition of `map`
0 : (1 + 0) : (map (1 +) (map (2 +) (partialSums [3..]))) == -- reduce arithmetic
0 : 1 : (map (1 +) (map (2 +) (partialSums [3..])))
Notice how pattern matching forced us to evaluate the next step of partialSums
, we can't choose the correct branch of map
without knowing whether the list is empty or not. But we don't need anything beyond that, we do not evaluate the whole list, just the existence of the first head/tail pair. More generally, pattern matching is one way to gradually force the evaluation of lists.
The next element 1
is available for grabbing, while the rest of the tail is not evaluated by (:)
.
Just for fun, we'll repeat this once again for the third element:
0 : 1 : (map (1 +) (map (2 +) (partialSums [3..]))) == -- evaluate `partialSums`
0 : 1 : (map (1 +) (map (2 +) (0 : (map (3 +) (partialSums [4..]))))) == -- evaluate the second `map`
0 : 1 : (map (1 +) ((2 + 0) : (map (2 +) (map (3 +) (partialSums [4..]))))) == -- reduce arithmetic
0 : 1 : (map (1 +) (2 : (map (2 +) (map (3 +) (partialSums [4..]))))) == -- apply first `map`
0 : 1 : (1 + 2) : (map (1 +) (map (2 +) (map (3 +) (partialSums [4..])))) == -- reduce arithmetic
0 : 1 : 3 : (map (1 +) (map (2 +) (map (3 +) (partialSums [4..]))))
And now 3
is available to us using a finite amount of elements from the input list. We can go on like this, every time consuming just one element from the input list to produce one element from the output. Since (:)
and map
are both lazy, we are never forced to evaluate the whole tail, just a finite chunk of it.
We can also rewrite partialSums
in terms of foldr
:
partialSums = foldr aux [0]
where aux x res = 0 : map (x +) res
This is a mechanical rewriting of the recursive solution5. Since foldr
is implemented as the same recursion pattern that we wrote manually before, this works just like the recursive version. Therefore, it still works for infinite lists.
We can prove it using the same logic, we just have to apply foldr
first. Recall that foldr
is defined as:
foldr _ z [] = z
foldr f z (x : xs) = f x (foldr f z xs)
Now we can evaluate the new partialSums
again:
partialSums [1..] == -- by definition `partialSums`
foldr aux [0] [1..] == -- pattern matching in `foldr`
foldr aux [0] (1 : [2..]) == -- by definition of `foldr`
aux 1 (foldr aux [0] [2..]) == -- apply the same logic recursively
aux 1 (aux 2 (foldr aux [0] [3..])) ==
aux 1 (aux 2 (aux 3 (foldr aux [0] [4..]))) == -- expand the `aux` calls
aux 1 (aux 2 (0 : (map (3 +) (foldr aux [0] [4..])))) ==
aux 1 (0 : (map (2 +) (0 : (map (3 +) (foldr aux [0] [4..]))))) ==
0 : (map (1 +) (0 : (map (2 +) (0 : (map (3 +) (foldr aux [0] [4..])))))) ==
From here this looks pretty much the same as in the recursive version, and is lazy for the same reasons.
Now suppose that we had a non-lazy version of map
:
badMap f xs =
if length xs == 0 then []
else f (head xs) : badMap f (tail xs)
This is logically the same thing as the regular map
, but it is no longer lazy, since it calls length
which forces the evaluation of the whole list.
If we use badMap
to implement partialSums
:
badPartialSums = foldr aux [0]
where aux x res = 0 : badMap (x +) res
The result will no longer be sufficiently lazy, and won't work for infinite lists:
take 6 $ badPartialSums [1..] -- never terminates
If our function calls a "not lazy enough" function, then it stops being "lazy enough".
Hopefully now the statement we started with makes more sense:
A function on lists is "lazy enough" when for all inputs it uses a finite chunk of the input to produce a finite chunk of the output.
Thanks for reading this far. I hope this helped you gain some intuition about laziness and infinite lists. Reach out if you have any further questions or comments.
foldr
anyone? ↩- This is not a formal definition, but good enough for our purposes. I'm not trying to be precise and fully explain laziness in Haskell, but just to generate enough intuition for the students' homework. ↩
- We'll ignore for the moment that
head
andtail
are not safe and should be banished from our lives. I don't want to bring inMaybe
, which might be distracting from the point. ↩ - This process of moving through equations is called "equational reasoning". A very useful tool when trying to understand Haskell code. ↩
- I'll spare myself from explaining the workings of
foldr
in this post. ↩ - As was noted in the Reddit discussion, I'm actually overloading the word "lazy", when what I actually mean by "lazy enough" is "productive". Since we're using Haskell, laziness is entangled with productivity anyways. I'm still on the fence on whether it's worthwhile introducing this new terminology while in the context of this particular homework set. ↩