gdritter repos documents / master posts / distributed-git.md
master

Tree @master (Download .tar.gz)

distributed-git.md @masterview markup · raw · history · blame

We used to use centralized version control systems, like SVN. Those had a single canonical version of the repository, and multiple scattered copies that are "checked out" from the central repo. You synchronize your current state with that repo, make some changes, and "push" those changes back to the central repo.

Now, most of the computing industry has switched over to decentralized version control systems, like mercurial or git. In these, any repo can pull from any other: if you and a friend are collaborating on a project, and both of you have publicly visible repositories, you can ping your friend to tell them about new changes you've made and they can pull them from you, and then they can make further changes and you can pull from them. In this setup, there is no "central repository", at least in the technical sense. If this project grows, it's probably a good idea to choose one person's repo to be the "canonical" one, but as far as the underlying technology is concerned, it's just another copy: it's only special in a social sense.1

Github and similar services were built to support this paradigm: I can create a repository, and anyone who wants can go in and, with a click, fork the repository, which copies it into their own namespace. You can pull changes from one or the other, or notify someone that you have changes you'd like them to consider with a pull request2. Github tends to think of the original copy of a repo as "canonical", but that's merely a convention.

On the other hand, Github itself is not decentralized: it offers a centralized toolset for managing a decentralized technology. This gets criticized regularly, especially when Github has a major outage and programmers all around the world can't get work done. Github also offers more tooling on top of just repo hosting and management: project wikis, issue tracking, commenting in various places, and so forth. So let's think about what a distributed Github would look like.

A Programmer's View

Let's call our distributed Git software GitNode.

Say I run an instance of GitNode on my personal web server at http://gitnode.gdritter.com/: this server contains copies of all the git repos I care about, and the GitNode server is aware of them, so if nothing else it gives me a nice browsable view of the state of the repo, past commits, and so forth.

My friend also runs an instance of GitNode at http://gitnode.example.com/, and they have a repo I want to work on called my-project. There are two ways I can do this:

  • I can go to my own GitNode instance and tell it the location of the repo I want to clone: that is, point it to gitnode.example.com:my-project, and it'll clone it for me. I could also write a bookmarklet or browser plugin to make this easier, to avoid having to retype or copy/paste things in.
  • I could also perform a visit to my friend's GitNode instance. I navigate to gitnode.example.com and click a Visit button, which

  1. I'm aware that I'm ignoring pushes here: you could also use git and friends basically the same way you would SVN, by having a central repository that everyone pull from and pushes to. This is true, but what's interesting about distributed version control is that you don't have to do so. In fact, the situation I'm talking about, you would probably have at least four repos: two of them on servers with a stable, addressable location, and two or more on local machines being worked on. Each server copy acts kind of like a 'central' copy for the programmer who owns it, and the pulls go back between them. 

  2. I'd argue this is a poor piece of terminology: the phrase is ambiguous enough that I originally believed it had something to do with asking permission to clone someone's work. I've heard the alternative phrase merge request used, which has the problem that it's less accurate to the underlying abstraction—it may or may not incur a merge in Git's sense—but it does hint at the actual operation happening a bit more. I'm open to alternatives!