gdritter repos ndbl / master README.md
master

Tree @master (Download .tar.gz)

README.md @masterview markup · raw · history · blame

NDBL

NDBL is a basic config file format based roughly on the config format of Plan 9's ndb.

Short Example

# A short exemplary snippet of NDBL
fileset=
  file=/home/gdritter/english.txt
  file=/home/gdritter/french.txt
  file=/home/gdritter/italian.txt
default=eng

remote=main
  ip=192.168.1.300
  user=guest
  key=public.key

Rationale

NDBL is a very straighforward configuration format, which means it's not always appropriate for situations in which large amounts of data are needed. It cannot represent arbitrary hierarchical structures like JSON, nor does it have the wealth of data types that YAML does. However, it is a very simple format to both produce and implement (both by hand and in code) and its simplicity is one of its major virtues. Other options in this space, but with more structure and extant tooling, include YAML and TOML, both of which are significantly more complicated than NDBL.

One major advantage of NDBL over other other configuration formats is that the NDBL grammar is a regular language, meaning it can be parsed in constant space by a finite state machine. While not intended for data storage, NDBL does possess the two qualities which are important for a data storage format: it is self-describing, meaning that you are guaranteed to be able to derive the internal representation of the external (serialized) document, and it is round-tripping, meaning that if we serialize an internal representation to a document and then convert back to an internal representation, we will obtain an identical internal representation. Note, for example, that neither XML or YAML necessarily have this property in practice.

As a pleasant but unintended side effect, an executable bash file that does nothing but define environment variables is often trivially an NDBL file.

Structure of an NDBL Document

All NDBL documents consist of a sequence of multisets of key-value pairs. All data is represented as text; it is the responsibility of the library user to parse numeric data, boolean data, &c, during use. NBDL requires the use of UTF-8 as input and produces UTF-8 chunks when parsed.

A comment is introduced by any whitespace (including newlines) followed by a pound sign (#) and lasts until the end of a line. This means that a key cannot begin with the # character, but that a # character can occur as a constitutent of a key-value pair, including as a trailing character.

A key-value pair consists of a string of at least length one, followed by a equals sign (=) and subsequently by a string of at least zero. The key must be a bare string, and can contain any printable non-whitespace character except the equals sign (=) and additionally must not begin with a pound sign (#). It is acceptable for a key to contains a pound sign if it is not the first character of the key. The value may be quoted with double quotes ("), in which case it is allowed to contain any printable character, including the equals sign, whitespace, and newlines. Quoted values understand the escape sequences \\ for a backslash and \" for a double quote; no other escape sequences are provided. An unquoted value is allowed to contain any printable non-whitespace character except the equals sign. The value can be zero length. No spaces are allowed around the equals sign.

A group is a sequence of key-value pairs. A group is introduced by a non-indented key-value pair; all subsequent key-value pairs on the same line, as well as any key-value pairs on subsequent indented lines, belong to the same group.

A document is a sequence of groups.

Examples With Explanation

In the examples below, I will use [ key: value ] as shorthand to represent an ordered sequence of key-value pairs.

host=machine1
host=machine2

This parses to the following structure:

[[ "host": "machine1" ], [ "host": "machine2" ]]

Adding indentation merges the two multisets, as the second line is now considered a 'cotinuation' of the first group.

host=machine1
  host=machine2

This becomes

[[ "host": "machine1", "host": "machine2" ]]

A third non-indented line will then start a new group:

host=machine1
  host=machine2
host=machine3

This becomes

[ [ "host": "machine1"
  , "host": "machine2"
  ]
, [ "host": "machine3" ]
]

Empty values are permitted, and are idiomatically used as a 'header' for a group of configuration options:

database=
  file=file1.txt
  file=file2.txt
  file=file3.txt

This becomes

[["database": "", "file": "file1.txt", "file": "file2.txt", "file": "file3.txt"]]

Comments are allowed but must come after a whitespace character, which means that the following document contains no comments:

key=value#hello

This becomes

[["key": "value#hello"]]

But this document does contain a comment:

key=value #hello

This becomes

[["key": "value"]]

Comments can begin a line, as well.

# WARNING: do not change
host=hg-remote
  portforwarding= # subject to change
  hostname=hunter-gratzner.example.com
  port=22
  user=abu-al-walid
  nicename="H-G Remote Server"

This document corresponds to

[ [ "host": "hg-remote"
  , "portforwarding": ""
  , "hostname": "hunter-gratzner.example.com"
  , "port": "22"
  , "user": "abu-al-walid"
  , "nicename": "H-G Remote Server"
  ]
]

NDBL does not provide arbitrarily-nested hierarchical strutures, a list type, a numeric type, a boolean type, or other niceties. This does not prevent the programmer from interpreting a value as numeric or boolean, or from using groups in an ad-hoc way to represent hierarchical data—it merely means that NDBL does not understand those as primitives, nor does it provide an interface in its API for understanding or manipulating them. For example, a tree structure could hypothetically be represented b a series of named nodes

name=A parent=root
name=B parent=A
name=C parent=A
name=D parent=C

but in the event that such structures arise, it would be better to switch from NDBL to a proper data storage format.

A Regular Language?

I asserted above that NDBL is a regular language. This is true in the sense that valid NDBL documents can be recognized by a regular language. However, they cannot be parsed by the kind of regular expressions that you have in most programming environments: this is because there is no well-founded way of matching a group underneath a Kleene star. We can construct a regular expression that matches a group:

([^# \t\r\n=][^ \t\r\n=]*)=("[^"|\\\\|\\"]*"|[^ \t\r\n=]*)

We could also write a regular expression that maches entire NDBL documents, using here {group} as shorthand for the regex above:

({group}((#[^\r\n]*[\r\n]*[ \t]*|[\r\n]*[ \t])*{group})*[\r\n*])*

But without structural regular expressions we cannot use this regex to actually pick apart the structure described by an NDBL document.

Haskell API

At present, the Haskell API exposes an encode/decode pair which (respectively) produce and consume lists of lists of key-value pairs. Future work involves typeclasses to make consuming and producing datatypes which correspond to NDBL documents easier.