gdritter repos documents / master posts / distributed-git.md
master

Tree @master (Download .tar.gz)

distributed-git.md @masterview rendered · raw · history · blame

We used to use centralized version control systems, like SVN. Those
had a single _canonical_ version of the repository, and multiple
scattered copies that are "checked out" from the central repo. You
synchronize your current state with that repo, make some changes,
and "push" those changes back to the central repo.

Now, most of the computing industry has switched over to
decentralized version control systems, like mercurial or git. In
these, any repo can pull from any other: if you and a friend are
collaborating on a project, and both of you have publicly visible
repositories, you can ping your friend to tell them about new
changes you've made and they can pull them from you, and then
they can make further changes and you can pull from them. In
this setup, there is no "central repository", at least in the
technical sense. If this project grows, it's probably a good
idea to choose one person's repo to be the "canonical" one, but
as far as the _underlying technology_ is concerned, it's just another
copy: it's only special in a _social_ sense.[^push]

[^push]: I'm aware that I'm ignoring _pushes_ here: you could also use
git and friends basically the same way you would SVN, by having a
central repository that everyone pull from and pushes to. This is
true, but what's interesting about distributed version control is
that you don't _have to_ do so. In fact, the situation I'm talking
about, you would probably have at least four repos: two of them
on servers with a stable, addressable location, and two or more
on local machines being worked on. Each server copy acts kind of
like a 'central' copy for the programmer who owns it, and the
pulls go back between them.

Github and similar services were built to support this paradigm: I
can create a repository, and anyone who wants can go in and, with
a click, _fork_ the repository, which copies it into their own
namespace. You can pull changes from one or the other, or notify
someone that you have changes you'd like them to consider with
a _pull request_[^pr]. Github tends to think of the original copy
of a repo as "canonical", but that's merely a convention.

[^pr]: I'd argue this is a poor piece of terminology: the phrase
is ambiguous enough that I originally believed it had something
to do with asking permission to clone someone's work. I've heard
the alternative phrase _merge request_ used, which has the problem
that it's less accurate to the underlying abstraction—it may or
may not incur a merge in Git's sense—but it _does_ hint at the
actual operation happening a bit more. I'm open to alternatives!

On the other hand, Github itself is _not_ decentralized: it offers
a centralized toolset for managing a decentralized technology.
This gets criticized regularly, especially when Github has a major
outage and programmers all around the world can't get work done.
Github also offers more tooling on top of just repo hosting and
management: project wikis, issue tracking, commenting in various
places, and so forth. So let's think about what a distributed
_Github_ would look like.

# A Programmer's View

Let's call our distributed Git software _GitNode_.

Say I run an instance of GitNode on my personal web server at
`http://gitnode.gdritter.com/`: this server contains copies of
all the git repos I care about, and the GitNode server is aware
of them, so if nothing else it gives me a nice browsable view of
the state of the repo, past commits, and so forth.

My friend _also_ runs an instance of GitNode at
`http://gitnode.example.com/`, and they have a repo I want to
work on called `my-project`. There are two ways I can do this:

- I can go to my own GitNode instance and tell it the location
of the repo I want to clone: that is, point it to
`gitnode.example.com:my-project`, and it'll clone it for me.
I could also write a bookmarklet or browser plugin to make this
easier, to avoid having to retype or copy/paste things in.
- I could also _perform a visit_ to my friend's GitNode instance.
I navigate to `gitnode.example.com` and click a _Visit_ button,
which