gdritter repos documents / master posts / data-files-and-backpack.md
master

Tree @master (Download .tar.gz)

data-files-and-backpack.md @master

7b4f134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
One pain point for certain Haskell programs is Cabal's `data-files`
feature, which is awkward and weird, especially when it comes to
_libraries_ which make use of data files. I mentioned to a coworker
that, once the
[lightweight module system Backpack](http://plv.mpi-sws.org/backpack/)
is implemented in Haskell, we could tackle a `data-files` mechanism
in a more convenient way by using module-level mixins. Just for
general reference, I'll sketch out what that might look like below.

I should stress that I'm describing a **possible solution**, and not
*the* solution. I'm by no means indicating that this is the best way
of solving the problems with data files, and I have absolutely no
indication that anyone else would want to solve the problem like this.
That said, I think this is an interesting and motivated design, and
I'd be happy to discuss its strengths and weaknesses as well as other
possible designs that address the same issues.

Right now, I'm using Edward Yang's blog post
[A Taste Of Cabalized Backpack](http://blog.ezyang.com/2014/08/a-taste-of-cabalized-backpack/)
as my primary guide to Backpack-in-practice. I don't have a Backpack-enabled
GHC and Cabal on hand, and so I haven't actually _run_ any of this: this should
right now be treated effectively as pseudocode. I also assume familiarity with
Cabal's data files support; if you're in need of an introduction
or a refresher, you should read the post
[Adding Data Files Using Cabal](http://neilmitchell.blogspot.com/2008/02/adding-data-files-using-cabal.html).

## An Abstract Signature for Data Files

In our hypothetical Backpack-enabled-data-files-support future, we
start by creating a _signature_ that corresponds to
the generated `Paths_whatever` module. To this end, we can create
an `.hsig` file with a declaration like this:

```.haskell
module Dist.DataFiles (getDataFileName) where
  getDataFileName :: FilePath -> IO FilePath
```

This defines an abstract module called `Dist.DataFiles` that
exposes a single function, `getDataFileName`, with no actual
implementation. We can expose this signature by creating
a package, `data-files-sig`, that exposes only this signature:

```
name:               data-files-sig
version:            1.0
indefinite:         True
build-depends:      base
exposed-signatures: Dist.DataFiles
```

This would be a standard packagemaybe even part of `base`—that
can be consistently and universally relied on by libraries that
require some kind of data file support.

## Creating A Library With Data Files

Now, let's create a library that needs a data file. In this case,
the library will do nothing but read and return the contents of
that data file:

```.haskell
module Sample.Library (getSampleFile) where

import Dist.DataFiles (getDataFileName)

getSampleFile :: IO String
getSampleFile = getDataFileName "sample-file" >>= readFile
```

Now we need to create a corresponding `.cabal` file for this
library. Because we're using `Dist.DataFiles`, we need to import
that signature from the `data-files-sig` module.
Importantly, we still don't have an
_implementation_ for `getDataFileName`. Because of that, our
package is still abstract, or in Backpack-speak, `indefinite`:

```
name:            sample-library
indefinite:      True
build-depends:   base, data-files-sig
exposed-modules: Sample.Library
```

## Depending On A Library With Data Files

In order to write an application that uses `sample-library`, we
need to give it a module that's a concrete implementation of the
`Dist.DataFiles` signature. In this case, let's create an
implementation manually as part of our application.

First, let's write a small application that uses `sample-library`:

```.haskell
module Main where

import Sample.Library (getSampleFile)

main :: IO ()
main = getSampleFile >>= putStrLn
```

We still don't have that concrete implementation for `getDataFileName`,
though, so let's write a simple module that exports the same name with
the same type:

```.haskell
module MyDataFilesImpl (getDataFileName) where

import System.FilePath ((</>))

getDataFileName :: FilePath -> IO FilePath
getDataFileName path = pure
  ("/opt/sample-application" </> path)
```

Now, when we write our `.cabal` file for this application, we also
need to specify we want to use `MyDataFilesImpl` as the concrete
implementation of `Dist.DataFiles` for `sample-library`. That
means our `.cabal` file will look like this:

```
name: sample-application
build-depends:
  base,
  filepath,
  sample-library (MyDataFilesImpl as Dist.DataFiles)
```

Now, all our abstract signatures are filled in, so this application
is no longer `indefinite`, and we as developers have a convenient
way of telling `sample-library` where we want it to look for its
data files. In fact, one advantage of this system for data files
is that we could import two libraries that both depend on the
`Dist.DataFiles` signature but tell them to look in two different
places for their data files, like this:

```
name: other-application
build-depends:
  base,
  lib-one (OneDataFilesImpl as Dist.DataFiles),
  lib-two (AnotherDataFilesImpl as Dist.DataFiles)
```

If there are reasonable default implementations for `Dist.DataFiles`,
we could also put those on Hackage and reuse them in much the
same way.

## A Final Sprinkling Of Magic

In this case, I'm still missing a major part of Cabal's `data-files`
support: namely, we want to shunt the responsibility from the
developer to Cabal, so that we have support for things like
relocatable builds. So in a final bit of handwaving, let's
stipulate that our tooling in this hypothetical future
has a special case to deal with applications that
expose an indefinite `Dist.DataFiles` signature: Cabal could notice
this situation, and fill those signagures in with sensible implementations
based on the commands and configurations we're using.

For example, if my `.cabal` file for `sample-application` above
_didn't_ supply a concrete implementation for `Dist.DataFiles`, then
a default one could be chosen for development that's equivalent to:

```.haskell
-- as automatically generated by cabal
module Dist.DataFiles (getDataFileName) where

getDataFileName :: FilePath -> IO FilePath
getDataFileName = pure
```

That is, the application will just look for the file in the
current directory.

If the developer started preparing the package for release, and
changed the configuration appropriately, then the automatically
generated `getDataFileName` could be modified to reflect that,
replacing the automatically generated code with something more
like

```.haskell
-- as automatically generated by cabal
module Dist.DataFiles (getDataFileName) where

import System.FilePath ((</>))

getDataFileName :: FilePath -> IO FilePath
getDataFileName path =
  pure ("/usr/share/sample-application" </> path)
```

This would be admittedly a little bit "magical", but it would
be a small and easy-to-explain bit of magic, and it would have
the advantage of affording a kind of flexibility that the
current approach to data files lacks.

## Is This How It's Actually Gonna Work?

Probably not! Backpack is still a ways out, and this would
require opt-in from many parts of the Haskell ecosystem, and
the problem it solves could probably also be solved in numerous
other ways I haven't considered. But this post describes a
point in the design space that I think is at least worth weighing!