gdritter repos documents / master posts / resources-laziness-and-cps.md
master

Tree @master (Download .tar.gz)

resources-laziness-and-cps.md @masterview rendered · raw · history · blame

This is a question posed by Aaron Levin on Reddit:

> In WAI, Application has the type
> `Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived`.
> There is a note about how historically this was
> `Request -> ResourceT IO Response`.
> There is the following note on Hackage about this change:
>
> > Note that, since WAI 3.0, this type is structured in continuation passing
> > style to allow for proper safe resource handling. This was handled in the
> > past via other means (e.g., ResourceT).
>
> A naive implementation might have `type Application = Request -> IO Response`.
> I would assume one could still handle appropriate resource cleanup (both on
> the client side and on the Wai/warp side) in this naive implementation, but
> I'm assuming I'm wrong in thinking that. So:
>
> 1. Why is Application defined in CPS style?
> 2. Is there an example of how resources cannot be cleaned up appropriately
>    using a naive implementation above?
> 3. Are there other reasons for the CPS style? (E.g. allows for more
>    efficiency as network resources can be freed while your handler is running)

Like
[my last post](http://blog.infinitenegativeutility.com/2016/7/writer-monads-and-space-leaks),
this is a piece of folklore that's been described in papers, mailing lists, &c,
but I don't know of a single resource to point people to, so I'll explain it
here. Unlike my last post, I'll do it operationally!

I'm gonna explain some preliminary details about laziness, the `trace` function,
and the `bracket` function; if you're already familiar with these, go ahead and
skip to the section [A Naïve Application Type](#a-naïve-application-type).

# Some Preliminary Explanations

A big part of this post involves the fact that Haskell is lazy: when we write
something in Haskell, it won't be evaluated until it's absolutely necessary
that it be readywhich might be never! For example, an expression like

~~~{.haskell}
>>> let f x y = x in f 5 undefined
5
~~~

will happily ignore our use of `undefined` and return the value we wanted. If
we want to _make sure_ that a given expression is evaluated, we need to
use it somehow. One way of using is to pattern-match on it: that needs to
evaluate it at least enough that we know what the outermost layer of data
looks like. Let's change the above example a little bit:

~~~{.haskell}
>>> let f x y = case y of { () -> x } in f 5 undefined
*** Exception: Prelude.undefined
~~~

Now, we're pattern-matching on the `y` value, which _forces_ it tothat is,
it causes it to be evaluated before we can evaluate the branch associated
with that `case`-expression.

In generalwith the exception of code that loops forever, or inherently
unsafe values like `undefined`—this should be a detail we don't notice!
In pure code that doesn't loop forever, evaluation order doesn't matter
at all, and imperative code in Haskell uses monads like `IO`. Because
each step in a monad has to be evaluated before the next step can proceed,
those computations appear to be strict, and the evaluation order of other
computations is largely opaque.

However, Haskell has a few features we can use to peek into what's going
on under the surface. One of them is the `trace` function, found in `Debug.Trace`,
which has the signature

~~~{.haskell}
trace :: String -> a -> a
~~~

This takes a string and any other value, and will print the string
_as soon as the value is forced_, not before, not after. That makes it
of sometimes questionable utility for debugging, as there's no guarantee
that `trace` will even run, much less run at the point we expect!

~~~{.haskell}
>>> let f x y = x in f 5 (trace "Hello!" undefined)
5
~~~

But because `trace` will print _exactly when_ when its result is needed,
we can use it to peek into how Haskell chooses to evaluate our code.

# Bracketing for Resources

The module `Control.Exception.Base` exports a function called `bracket`:

~~~{.haskell}
bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c
~~~

This function is used for managing resources that we need to clean up,
like file handles or large pieces of data we don't want to keep in
memory. The first argument is an `IO` computation that produces a
resource, the second one is the cleanup function for that resource,
and the third one is a function that uses that resource to produce a
value prior to cleanup:

~~~{.haskell}
main :: IO ()
main = bracket
  (openFile "tmp.txt" ReadMode)  -- opening a resource: here, a file
  hClose                         -- closing the resource
  (\ handle -> do                -- what to do with the resource
       contents <- hGetContents handle
       putStrLn contents)
~~~

The common function `withFile` is implemented straightforwardly
in terms of `bracket` almost exactly like I used it above. The
major utility of `bracket` is that it knows how to correctly handle
exceptions that occur while running computations: if an exception
gets raised while the resource is being used, it will nevertheless
ensure that the resource gets closed appropriately.

# A Naïve Application Type

Let's try writing a toy version of WAI that doesn't actually do
anything. We're going to pretend that we're writing some kind of
server where some responses might depend on scarce resources,
so we want to clean up that resource as soon as we can.

Let's start with importing the functions above and
defining some dummy types:

~~~{.haskell}
import Control.Exception.Base (bracket)
import Debug.Trace (trace)

-- Our dummy requests and responses
data Request = Request
data Response = Response

-- Our dummy resource
data SomeResource = SomeResource

-- Our naive @App@ type
type App = Request -> IO Response
~~~

Let's write our `App` handler first. Because we're trying to mimic a
real application, we pattern-match on the `Response` value to force
Haskell to evaluate it before returning:

~~~{.haskell}
runApp :: App -> IO ()
runApp app = do
  response <- app Request
  case response of
    Response -> putStrLn "Sending response"
~~~

Now, let's create an `App` that uses some "resource". In this case, I'm
going to fake it, of course. Additionally, I'm going to include a call
to `trace` so that we can be sure of when our `Response` actually
gets evaluated:

~~~{.haskell}
myApp :: App
myApp request = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          return (trace "EVALUATING RESPONSE" Response)
~~~

And now let's wrap this all up:

~~~{.haskell}
main :: IO ()
main = runApp myApp
~~~

When I run this program on my machine here's what I get:

~~~
$ runhaskell myapp.hs
Creating resource
Destroying resource
EVALUATING RESPONSE
Sending response
~~~

What does this mean? Well, this means that we end up destroying
the resource we are trying to use _even before the response gets
constructed_. That's not good! In this case, the resource is just
a dummy value, but if it were instead, say, a file handle, then this
program would close the file handle before we ever tried to read from
it!

But we're using `bracket` and everything! That _should_ mean that
the action we're passing to `bracket`—the `respond` function we
defined up above—should run before the resource gets destroyed!
Shouldn't it?

As a matter of fact, it is. The `IO` action is getting run, and the `IO`
steps are getting evaluated eagerly. If we rewrite the `myApp` function
slightly to insert a print statement there

~~~{.haskell}
myApp' :: App
myApp' request = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          putStrLn "Responding to request" -- A new print statement
          return (trace "EVALUATING RESPONSE" Response)
~~~

and then run it again, we'll see that the `respond` function _is_ getting
properly bracketed between resource creation and destruction:

~~~
$ runhaskell myapp.hs
Creating resource
Responding to request
Destroying resource
EVALUATING RESPONSE
Sending response
~~~

The problem is that while `IO` steps are evaluated eagerly, the values
that get `return`ed are subject to the normal rules of laziness:
_they aren't forced until we use them_. In this case, we don't actually
force the `Response` value until the `case`-expression in `runApp`,
which doesn't happen until after the entire `myApp` function—`bracket`
and allhas finished running.

So the _problem_ is that, while `bracket` can ensure that `IO` actions
are properly bookended with resource-related code, it can't make the
same guarantees with respect to the _values_ inside it. We need some
way of forcing the value _before_ the cleanup code gets run. We can
rewrite `myApp` to do this:

~~~{.haskell}
myStricterApp :: App
myStricterApp request = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          putStrLn "Responding to request" -- A new print statement
          let response = trace "EVALUATING RESPONSE" Response
          case response of  -- pattern-match to force evaluation
            Response -> return response
~~~

This produces the output we want:

~~~
$ runhaskell myapp.hs
Creating resource
Responding to request
EVALUATING RESPONSE
Destroying resource
Sending response
~~~

But this has a lot of drawbacks: it's ugly, it inserts apparently
extraneous case statements, and I've papered over the fact that, if we
had a more complicated data structure, we'd need to force evaluation of
_the entire thing_, not just the outermost structure. But even worse, it's
not a general solution: WAI is a library that exports the equivalent of
`runApp` and expects users to write the equivalent of `myApp`, which means
we'd be forcing every single user to _remember_ to force their `Response`
values. If a user forgets to insert the ugly and opaque code, then their
program might start crashing oreven worserunning but nonsensical results.
What we want is some way of _ensuring_ that any `App` _must_ force its
`Response` before cleaning up any resources.

# CPS To The Rescue

The core idea of continuation-passing is that a continuation is a function
that represents "what to do next": a continuation is a callback function
that expects the result of some computation. Let's rewrite our program
but with an `App` type that looks like the CPS-ey type WAI actually uses:

~~~{.haskell}
-- Our modified @App@ type
type App = Request -> (Response -> IO ()) -> IO ()
~~~

Now, our `App` takes two arguments—the `Request`, as before, as well as
a function that takes a `Response` and produces an `IO` action. The logic
of "sending" the response now gets moved into the callback function we
pass to the `App`:

~~~{.haskell}
runApp :: App -> IO ()
runApp app =
  app Request
      (\ response -> case response of
           Response -> putStrLn "Sending response")
~~~

Our implementation of `myApp` is _almost_ the same, except we take an
extra callback parameter, and instead of returning our `Response`, we
pass it to that callback:

~~~{.haskell}
myApp :: App
myApp request cb = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          cb (trace "EVALUATING RESPONSE" Response)
~~~

If I run _this_ version of the program, I get this output:

~~~
$ runhaskell myapp.hs
Creating resource
EVALUATING RESPONSE
Sending response
Destroying resource
~~~

This is much better! This _ensures_ that the resource outlives the
`Response`, and because now the callback is responsible for forcing
the `Response`, it can guarantee that it gets done. This lets us
ensure that propery of _any_ `App`! This is something we _couldn't_
ensure with the previous one, because it would have required some
way of reaching "inside" the `App` to force the return value. By
having a callback that the user has to call, we can force it ourselves
without having to hope that the user does!

well, except there's that remaining little issue: using the
implementation above, the user doesn't _have_ to call the callback,
so we could write an `App` that produces no `Response` at all:

~~~{.haskell}
myBadApp :: App
myBadApp _ _ = return ()
~~~

We can fix this by creating a new typeWAI calls it `ResponseReceived`—and
hiding every way to construct it except using the callback. That way, the user
has to use the callback _at least_ once in order to get a `ResponseReceived`
value, which they can return from their `App`.
(We will have the problem that we can call the callback _more than_
once: we could keep working on solutions, but we'd have to move to more
complicated solutions and WAI ends up suggesting that you just not do that.)

So this is why the CPS style is indeed necessary for resource management here:
it's not enough to use `bracket`, you have to use `bracket` _and_ have the
appropriate amount of strictness, and the CPS style helps you achieve that.