Everything in Haskell Is a Thunk

There are a lot of misconceptions about Haskell.

Some people think everything in Haskell is a function. Others think that Haskell is just an implementation of Church’s lambda calculus. But both of these are false.

In reality, everything in Haskell is a thunk.

A thunk is a value that has not been evaluated yet. If everything in Haskell is a thunk, then that means that nothing in Haskell is evaluated unless it has to be evaluated.

From a non-functional programmer’s perspective, a thunk may initially seem to be a useless feature. To understand why thunks are useful, consider the following Python code.

x = 5 / 0

Traceback (most recent call last):
  File "", line 1, in 
ZeroDivisionError: division by zero

Unlike Haskell, Python evaluates the results immediately, which means that Python will crash as soon as an error occurs.

On the other hand, Haskell is different. In fact, you can string together dozens of lines of code, all of which contain an error, and no error will appear until one of those lines get evaluated.

let x = 3 `div` 0
let y = x

-- Can even divide x and y!
let z = x `div` y

let stringX = show x

-- Evaluate stringX, which
-- evaluates show x, which evaluates x
-- which finally evaluates 3 `div` 0, causing
-- the error to occur
putStrLn stringX

In this example, no matter what, Haskell will not give an error until something is evaluated. A common misconception is that a function call will always force the results to be evaluated, but that is simply not true.

In the example above, z is assigned the value of x divided by y, and yet there is no error. It is only when the value of stringX is evaluated via putStrLn (which prints a String) that Haskell throws an error.

One way to imagine thunks is to think of every value and function as being wrapped in a box. No one is allowed to peek into the box until the order is given. For all Haskell knows, the box could have anything in it, and Haskell wouldn’t care. As long as there are no type conflicts, Haskell will happily pass the box along.

In the code above, Haskell is fine with dividing two errors (both x and y, when evaluated, will give a division by zero error). However, it is only because x and y are both thunks, and Haskell doesn’t actually know the true values of x and y.

This gives Haskell the benefit of aggressively optimizing code, allowing for massive performance boosts.

Imagine if Haskell had an expensive function, F, that takes in one parameter, takes 10 seconds to run, and returns the evaluated value 5. Say that this same function also existed in Python.

If we were to run F ten times, we would find that Haskell only takes about 10 seconds to finish, while Python would take 100 seconds. Here, Haskell’s functional properties and lazy evaluation allows it to outperform Python.

In Python, functions are allowed to have side-effects. This means that if a Python function is run ten times, with the same input, it can give ten different results. On the other hand, for Haskell, if a function is run ten times, with the same input, it will always give the same result. This means that there are multiple copies of the same function, called on the same input, Haskell can be certain that those results will all be the same. This is known as referential transparency, and it’s a feature that exists in most functional programming languages.

This property, combined with Haskell’s lazy evaluation, allows Haskell to memoize function calls. Now, if Haskell sees multiple copies of this expensive function called on the exact same input, it will simply evaluate one of the function calls, cache the result, and replace every future function call of F on that same input with the cached result.

What About Infinite Lists?

One consequence of everything being a thunk in Haskell is that Haskell is able to create infinite lists, and pass infinite lists around. In other words, Haskell is able to process and manipulate infinite lists as if they weren’t infinite, because they aren’t. Although the lists are technically infinite, Haskell only takes the elements that it wants.

For example,

let x = [1..]
let y = [1..]

x ++ y
-- concatenate all of x with all of y
-- to infinity
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5 ...]

Here, x is stored to the infinite list from 1 to infinity, and y is stored to the exact same infinite list. Haskell is perfectly fine with storing these infinite data structures, because Haskell does not actually evaluate these structures until it needs to. When they are finally evaluated, they perform exactly like an infinite list — because that is exactly what x and y are.

In Haskell, treating everything as a thunk allows big performance boosts, and also gives the programmer the ability to store infinite data structures. So the next time someone says, “Everything in Haskell is X”, gently remind yourself that unless X is a thunk, they’re most likely wrong.

Advertisements

One thought on “Everything in Haskell Is a Thunk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s