gdritter repos documents / master posts / daemontools.html
master

Tree @master (Download .tar.gz)

daemontools.html @masterraw · history · blame

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<p>I basically always use some program in the <code>daemontools</code> family on my computers. My home laptop and desktop are booted with an init system (<code>runit</code>) based on <code>daemontools</code>, while many of the systems I set up elsewhere boot a vanilla distribution but immediately set up a <code>daemontools</code> service directory as a secondary service management tool. Quite frankly, it's one of the best examples of good Unix design and at this point I wouldn't want to go without it.</p>
<p>This is a high-level introduction to the <em>idea</em> of <code>daemontools</code> rather than a full tutorial: to learn how to set it up in practice, <a href="http://cr.yp.to/daemontools.html">djb's own site</a> as well as <a href="http://rubyists.github.io/2011/05/02/runit-for-ruby-and-everything-else.html">a handful</a><sup><a href="#fn1" class="footnoteRef" id="fnref1">1</a></sup> <a href="http://www.troubleshooters.com/linux/djbdns/daemontools_intro.htm">of others</a> are better references.</p>
<h1 id="what-is-daemontools">What is Daemontools?</h1>
<p>The <em>core</em> of <code>daemontools</code> is just two programs: <code>svscan</code> and <code>supervise</code>. They're very straightforward: <code>svscan</code> takes a single optional argument, and <code>supervise</code> takes a single mandatory one.</p>
<p><a href="http://cr.yp.to/daemontools/svscan.html"><code>svscan</code></a> watches a directory (if none is specified, then it will watch the current working directory) and checks to see if new directories have been added. Any time a new directory is added, it starts an instance of <code>supervise</code> pointing at that new directory<sup><a href="#fn2" class="footnoteRef" id="fnref2">2</a></sup>.</p>
<p>And that's all that <code>svscan</code> does.</p>
<p><a href="http://cr.yp.to/daemontools/supervise.html"><code>supervise</code></a> switches to the supplied directory and runs a script there called <code>./run</code>. If <code>run</code> stops running for <em>any reason</em>, it will be started again (after a short pause, to avoid hammering the system.) It will also not start the <code>./run</code> script if a file called <code>./down</code> exists in the same directory. Extra data about the running process gets stored in a subdirectory called <code>supervise</code>, and a few other tools can be used to prod and modify that data—for example, to send certain signals to kill the running program, to temporarily stop it, or to see how long it has been running.</p>
<p>And that's almost all that <code>supervise</code> does.</p>
<p>One extra minor wrinkle is that if <code>supervise</code> is pointed at a directory that also contains a subdirectory called <code>./log</code>, and <code>./log/run</code> also exists, then it will monitor that executable <em>as well</em> and point the stdout of <code>./run</code> to the stdin of <code>./log/run</code>. This allows you to build a custom logging solution for your services if you'd like. The <code>./log</code> directory is optional.</p>
<p>So, how does this run a system? Well, you point <code>svscan</code> at a directory that contains a subdirectory for each service you want to run. Those services are generally small shell scripts that call the appropriate daemon in such a way that it will stay in the foreground. For example, a script to run <code>sshd</code> might look like:</p>
<pre><code>#!/bin/sh

# redirecting stderr to stdout
exec 2&gt;&amp;1

# the -D option keeps sshd in the foreground
# and the -e option writes log information to stderr
exec /usr/sbin/sshd -D -e</code></pre>
<p>And your directory structure might look like</p>
<pre><code>-+-service/
 |-+-ngetty/
 | |---run
 | |-+-log/
 |   |---run
 |-+-sshd/
 | |---run
 | |-+-log/
 |   |---run
 |-+-crond/
 | |---run
 | |-+-log/
 |   |---run</code></pre>
<p>Once you point <code>svscan</code> at this, you end up having a process tree where <code>svscan</code> is managing multiple <code>service</code> instances which in turn manage their respective services and logging services:</p>
<pre><code>-svscan-+-service-+-ngetty
        |         `-log-service
        +-service-+-sshd
        |         `-log-service
        +-service-+-cond
        |         `-log-service</code></pre>
<p>This design has some pretty amazing practical advantages, many of which are attributable to the fact that <code>daemontools</code> <em>is written in terms of Unix idioms</em>. The &quot;Unix way&quot; gets a fair amount of derision—some well-deserved, some not—but <code>daemontools</code> is a good example of how embracing the idioms of your system can produce better, more flexible software. Consider the following problems and their <code>daemontools</code> solutions:</p>
<h2 id="testing-a-service-before-you-start-it">Testing a Service Before You Start It</h2>
<p>The <code>./run</code> script is a plain executable. If it runs and stays in the foreground, doing what it should do, it's correct. If it doesn't, then there's a problem. That's also the only code path, which is a sharp contrast to the infamously difficult-to-write <code>sysvinit</code> scripts, where <code>start</code> and <code>stop</code> and <code>status</code> and so forth must all be tested in various system states<sup><a href="#fn3" class="footnoteRef" id="fnref3">3</a></sup>.</p>
<h2 id="starting-and-stoping-a-service">Starting and Stoping a Service</h2>
<p>All you do is create or delete a service directory. The most common way of doing this is to create the service directory elsewhere, and then create a symlink into the service directory to start it. This lets you delete a symlink without deleting the main directory, and furthermore ensures that the 'creation' of the directory is atomic.</p>
<p>Another tool, <code>svc</code>, lets you send signals to the running processes (e.g. <code>svc -p</code> sends a <code>STOP</code> signal, and <code>svc -d</code> sends a <code>TERM</code> signal as well as telling <code>supervise</code> to hold off on restarting the service otherwise.)</p>
<h2 id="express-service-dependencies">Express Service Dependencies</h2>
<p>The <code>daemontools</code> design allows for various helper tools. One of them is <code>svok</code>, which finds out whether a given service is running. This is just another Unix program that will exit with either <code>0</code> if the process is running, or <code>100</code> if it is not. That means we can write</p>
<pre><code>#!/bin/sh
svok postgres || exit 1
exec 2&gt;&amp;1
exec python2 some-web-app.py</code></pre>
<p>and the script will die (prompting <code>svscan</code> to wait a moment and then restart it) unless <code>postgres</code> is already running.</p>
<h2 id="express-resource-limits">Express Resource Limits</h2>
<p><code>daemontools</code> has several other applications that can enforce various resource limits or permissions. These are not part of the service mechanism—instead, they simply modify <em>the current process</em> and then <code>exec</code> some other command. That means that you can easily incorporate them into a service script</p>
<pre><code>#!/bin/sh
exec 2&gt;&amp;1
# change to the user &#39;sample&#39;, and then limit the stack segment
# to 2048 bytes, the number of open file descriptors to 2, and
# the number of processes to 1:
exec setuidgid sample \
     softlimit -n 2048 -o 2 -p 1 \
     some-small-daemon -n</code></pre>
<p>These aren't actually special, and don't have anything to do with the <code>daemontools</code> service mechanism. Any shell script can incorporate <code>setuidgid</code> or <code>softlimit</code>, even if those scripts have nothing to do with service management!</p>
<h2 id="allow-user-level-services">Allow User-Level Services</h2>
<p>If I want a given <em>user</em> to have their own services that are run <em>as</em> that user, all I need to do is have another <code>svscan</code> running as that user and pointing at another directory, which I can run as another top-level service:</p>
<pre><code>#!/bin/sh
exec 2&gt;&amp;1
exec setuidgid user \
     /usr/sbin/svscan /home/user/service</code></pre>
<h1 id="variations">Variations</h1>
<p>What I described above was vanilla <code>daemontools</code>. Other systems are designed for booting entire systems with this kind of service management. Variations on this basic design add various features:</p>
<ul>
<li>The <a href="http://smarden.org/runit/"><code>runit</code></a> package extends <code>supervise</code> with the ability to execute a <code>./finish</code> script if the <code>./run</code> script fails, to do various kinds of cleanup. (<code>runit</code> renames <code>svscan</code> and <code>supervise</code> to <code>runsvdir</code> and <code>runsv</code>, respectively.)</li>
<li>The <a href="http://skarnet.org/software/s6/index.html"><code>s6</code></a> package adds even more options to both core programs (which are here named <code>s6-svscan</code> and <code>s6-supervise</code>) to e.g. limit the maximum number of services or modify how often scanning is done. It additionally allows control of an <code>s6-supervise</code> instance through a directory of FIFOs called <code>./event</code>.</li>
<li>The <a href="http://untroubled.org/daemontools-encore/"><code>daemontools-encore</code></a> package adds even more optional scripts: a <code>./start</code> script which is run before the main <code>./run</code> script and a <code>./stop</code> script after the service is disabled, a <code>./notify</code> script which is invoked when the service changes, and a few others.</li>
<li>The <a href="http://homepage.ntlworld.com/jonathan.deboynepollard/Softwares/nosh.html"><code>nosh</code></a> package is designed as a drop-in replacement for <code>systemd</code> on platforms where <code>systemd</code> cannot run (i.e. any Unix that is not a modern Linux) and so has a lot of utilities that superficially emulate <code>systemd</code> as well as tools which can convert <code>systemd</code> units into <code>nosh</code> service directories. <code>nosh</code> is the most radically divergent of the bunch, but is clearly a <code>daemontools</code> descendant (and incorporates most of the changes from <code>daemontools-encore</code>, as well.)</li>
</ul>
<p>Additionally, all these (except for <code>daemontools-encore</code>) have other capabilities used to set up a Unix system before starting the service-management portion.</p>
<h1 id="the-takeaway">The Takeaway</h1>
<p>The whole <code>daemontools</code> family has two properties which I really appreciate:</p>
<ol style="list-style-type: decimal">
<li>A strong commitment to never parsing anything.</li>
<li>A strong commitment to using Unix as a raw material.</li>
</ol>
<h2 id="why-avoid-parsing">Why avoid parsing?</h2>
<p>Parsing is a surprisingly difficult thing to get right. Techniques for writing parsers vary wildly in terms of how difficult they are, and <a href="http://www.cs.dartmouth.edu/~sergey/langsec/occupy/">parsing bugs are a common source</a> of <a href="https://en.wikipedia.org/wiki/Weird_machine">weird machines</a> in computer security. Various techniques can make parsing easier and less bug-prone, but it's a dangerous thing to rely on.</p>
<p>One way to get around this is to just skip parsing altogether. This is difficult in Unix, where most tools consume and emit plain text (or plain binary.) In other systems, such as in individual programming environments or systems like Windows PowerShell, the everything-is-plain-text requirement is relaxed, allowing tools to exchange structured data without reserializing and reparsing.</p>
<p>The way to avoid parsing <em>in Unix</em> is to use various kinds of structure to your advantage. Take the file system: it can, used correctly, emulate a tree-like structure or a key-value store. For example, one supplementary <code>daemontools</code> utility is <code>envdir</code>, which reads in environment variables not by parsing a string of <code>name=value</code> pairs, but by looking at a directory and turning the filename-to-file-contents mapping into a variable-name-to-variable-content mapping.</p>
<p>You might argue that this is silly—after all, parsing an environment variable declaration is as easy as <code>name=value</code>! Could a system really introduce a security bug in parsing something as simple as that? As it happens, <a href="https://en.wikipedia.org/wiki/Shellshock_%28software_bug%29">the answer is yes.</a></p>
<p>So <code>daemontools</code> avoids parsing by using directories as an organizing principle, rather than parsing configuration files.<sup><a href="#fn4" class="footnoteRef" id="fnref4">4</a></sup> This is very much a design principle in its favor.</p>
<h2 id="what-is-unix-as-a-raw-material">What is &quot;Unix as a raw material&quot;?</h2>
<p>The building blocks of <code>daemontools</code> are the parts of Unix which are common to every modern Unix variant: directories and executables and Unix processes and (in some of its descendants) FIFOs. This means you have a universe of actions you can perform outside of the <code>daemontools</code> universe:</p>
<ul>
<li>Your scripts can be written in anything you'd like, not just a shell language. You could even drop a compiled executable in, at the cost of later maintainability.</li>
<li>Similarly, <code>daemontools</code> services are trivially testable, because they're just plain ol' executables.</li>
<li>Lots of details get moved out of service management because they can be expressed in terms of other building blocks of the system. There's no need for a 'which user do I run as' configuration flag, because that can get moved into a script. (Although that script can also consult an external configuration for that, if you'd like!)</li>
<li>Your directories can be arranged in various ways, being split up or put back together however you'd like.<sup><a href="#fn5" class="footnoteRef" id="fnref5">5</a></sup></li>
</ul>
<p>In contrast, service management with <code>upstart</code> or <code>systemd</code> requires special configuration files and uses various other RPC mechanisms, which means that interacting with them requires using the existing tools and… isn't really otherwise possible. Testing a service with <code>upstart</code> or <code>systemd</code> requires some kind of special testing tool in order to parse the service description and set up the environment it requests. Dependency-management must be built in, and couldn't have been added in afterwards. The same goes for resource limits or process isolation. …and so forth.</p>
<p>&quot;Unix design&quot; has sometimes been used to justify some very poor design choices, but well-done system design that embraces the Unix building blocks in sensible ways has a lot in common with functional program design: small building blocks that have well-defined scope and semantics, well-defined side effects (if any), and fit well into a larger system by making few assumptions about what exists outside of them. <code>daemontools</code> is a perfect example of Unix design done well.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>This one is about <code>runit</code>, not <code>daemontools</code>, but they are similar enough in principle.<a href="#fnref1"></a></p></li>
<li id="fn2"><p>It does this not using <code>inotify</code> or some other mechanism, but rather just by waking up every five seconds and doing a quick traversal of everything in the directory. This is less efficient, but also makes fewer assumptions about the platform it's running on, which means <code>daemontools</code> can run just about anywhere.<a href="#fnref2"></a></p></li>
<li id="fn3"><p>Of course, your daemon might still rely on state—but that's the fault of your daemon, and no longer inherent in the service mechanism. Contrast this to <code>sysvinit</code>-style scripts, where the only possible API is a stateful one in which the script does different things depending on the process state.<a href="#fnref3"></a></p></li>
<li id="fn4"><p>One might argue that this is a little bit disingenuous: after all, you're still invoking shell scripts! If one part of your system avoids parsing, but then you call out to a piece of software as infamously complicated and buggy as <code>bash</code>, all that other security is for naught. But there's no reason that you <em>have</em> to write your scripts in <code>bash</code>, and in fact, the creator of <code>s6</code> has built a new shell replacement for that purpose: namely, <a href="http://skarnet.org/software/execline/"><code>execline</code></a>, which is designed around both security and performance concerns. If you wanted, you could replace all those shell scripts with something else, perhaps something more like the <a href="http://shill.seas.harvard.edu/"><code>shill</code> language</a>. Luckily, the <code>daemontools</code> way is agnostic as to what it's executing, so it is easy to adopt these tools as well!<a href="#fnref4"></a></p></li>
<li id="fn5"><p>I personally tend to have a system-level <code>/etc/sv</code> for some services and a user-level <code>/home/gdritter/sv</code> for other services, regardless of whether those services are run in my user-level service tree in <code>/home/gdritter/service</code> or the root-level tree in <code>/service</code>.<a href="#fnref5"></a></p></li>
</ol>
</div>
</body>
</html>