gdritter repos documents / b4833e9
two more unfinished posts Getty Ritter 8 years ago
2 changed file(s) with 139 addition(s) and 0 deletion(s). Collapse all Expand all
(New empty file)
1 I did a terrible thing. It's something lots of programmers do at
2 some point in their lives, but I had kind of hoped to avoid doing,
3 and so I was kind of shocked when I realized what had happened.
4
5 I wrote a key-value store library.
6
7 Okay, so, I didn't actually write the key-value store _itself_.
8 What happened was, I wanted to use some kind of simple on-disk
9 key-value-store library like Berkeley DB or Tokyo Cabinet
10 in a Haskell program. There are some _really nice_ bindings
11 to these kinds of libraries in languages like Python:
12
13 ~~~.python
14 from tokyocabinet import *
15
16 # using Tokyo Cabinet's b-tree implementation
17 bdb = BDB()
18 bdb.open("sample.tcb", BDBOCREAT)
19 bdb["foo"] = "bar"
20 bdb.close()
21 ~~~
22
23 Almost all the existing Haskell bindings were thin wrappers
24 over the C implementations. For example, here's an analogous
25 program in Haskell, using Berkeley DB:
26
27 ~~~.haskell
28 import Database.Berkeley.Db
29
30 main = do
31 env <- dbEnv_create []
32 dbEnv_open [DB_CREATE]
33 db <- db_create [] env
34 db_open [DB_CREATE] DB_BTREE 0 db Nothing "sample.bdb" Nothing
35 db_put [] db Nothing "foo" "bar"
36 db_close [] db
37 dbEnv_close [] env
38 ~~~
39
40 Yeesh. There's a lot of boilerplate for what is fundamentally a
41 simple operation: "Open a database and store this mapping."
42
43 ## The Basics of Tansu
44
45 So I wrote a simple wrapping library. Here's an analogous program
46 with my Tansu library:
47
48 ~~~.haskell
49 import Database.Tansu
50 import Database.Tansu.Backend.BerkeleyDb
51
52 main = withBerkeleyDb "sample.bdb" $ \db ->
53 run db ("foo" =: "bar")
54 ~~~
55
56 This is a pretty huge improvement in terms of readability and
57 code size. But there's more! The keys and values transparently
58 use the `Serialize` typeclass from the `cereal` library to
59 convert the keys and values into strings of bytes: consequently,
60 we can store values of any type and index by values of any
61 type as well:
62
63 ~~~.haskell
64 {-# LANGUAGE DeriveGeneric, DeriveAnyClass #-}
65
66 import Control.Monad (zipWithM_)
67 import Data.Serialize (Serialize)
68 import Database.Tansu
69 import Database.Tansu.Backend.BerkeleyDb
70 import GHC.Generics (Generic)
71
72 -- Define a `Person` type with a `Serialize` instance
73 data Person = Person
74 { fullName :: String
75 , currentAge :: Int
76 , favoriteColor :: String
77 } deriving (Eq, Show, Generic, Serialize)
78
79 -- Create our people list
80 people :: [(String, Person)]
81 people = [ ("alex", Person "Alex Xie" 22 "mauve")
82 , ("blake", Person "Blake MacPool" 33 "chartreuse")
83 , ("cal", Person "Cal Lopez" 44 "pearl")
84 ]
85
86 main :: IO ()
87 main = withBerkeleyDb "sample.bdb" $ \db ->
88 run db $ forM_ people (\ (k,v) -> k =: v)
89 ~~~
90
91 I've glossed over another part, too: Tansu is also parametric
92 in the _backend_. I've been using the `BerkeleyDb` backend, but
93 the `Tansu` operations are written in an abstract way that allows
94 backends to be swapped out without requiring any other changes
95 to the program. The Berkeley DB backend is actually kept in a
96 the separate package `tansu-berkeleydb`[^gpl], while the core operations
97 are kept in the `tansu` package. The `tansu` package exposes two
98 very basic backends: the `Filesystem` backend, which represents
99 a key-value mapping as files in a directory, and the `Ephemeral`
100 backend which doesn't save the mapping but just keeps it in memory
101 and throws it away at the end.
102
103 [^gpl]: This has the extra advantage that, while the `tansu-berkeleydb`
104 library must be released under the GPL because Berkeley DB is also
105 under the GPL, the `tansu` package itself can be released under
106 the a restrictive BSD license.
107
108 In addition to the `tansu-berkeleydb` backend, I've also written
109 one that uses a table in a SQLite database to store its data.
110
111 ## Some Drawbacks and Caveats
112
113 The goal of `tansu` was to build a quick and easy library for use
114 in new Haskell programs. Consequently, the library is designed in
115 a way that makes it fast and easy to use in Haskell, but at the
116 cost of making it more difficult to use across languages or with
117 existing key/value stores.
118
119 A concrete example of this is that the serialization used is the
120 `cereal` library's serialization routes, which means that, even
121 when storing plain ASCII `String`s for keys and values, the actual
122 values that are stored are not the same sequence of bytes as the
123 raw ASCII keys. They are first run through `cereal`'s `encode` function,
124 which adds a 64-bit length to the start:
125
126 ~~~
127 0000 0000 0000 0003 666f 6f
128 [ 64-bit length ] [chars]
129 ~~~
130
131 In order to use a `tansu`-generated database from another language,
132 you would probably have to reimplement the serialization and
133 deserialization logic from the `Serialize` typeclass, which would
134 be a non-trivial amount of work. One way around this is to use the
135 `RawString` newtype wrapper exposed in `Database.Tansu.RawString`,
136 which is a `ByteString` whose `Serialize` instance simply dumps
137 and reads the full raw bytestring. This violates several other
138 `Serialize` assumptions, so should be used with caution.
139