Started distributed git draft
Getty Ritter
8 years ago
1 | We used to use centralized version control systems, like SVN. Those | |
2 | had a single _canonical_ version of the repository, and multiple | |
3 | scattered copies that are "checked out" from the central repo. You | |
4 | synchronize your current state with that repo, make some changes, | |
5 | and "push" those changes back to the central repo. | |
6 | ||
7 | Now, most of the computing industry has switched over to | |
8 | decentralized version control systems, like mercurial or git. In | |
9 | these, any repo can pull from any other: if you and a friend are | |
10 | collaborating on a project, and both of you have publicly visible | |
11 | repositories, you can ping your friend to tell them about new | |
12 | changes you've made and they can pull them from you, and then | |
13 | they can make further changes and you can pull from them. In | |
14 | this setup, there is no "central repository", at least in the | |
15 | technical sense. If this project grows, it's probably a good | |
16 | idea to choose one person's repo to be the "canonical" one, but | |
17 | as far as the _underlying technology_ is concerned, it's just another | |
18 | copy: it's only special in a _social_ sense.[^push] | |
19 | ||
20 | [^push]: I'm aware that I'm ignoring _pushes_ here: you could also use | |
21 | git and friends basically the same way you would SVN, by having a | |
22 | central repository that everyone pull from and pushes to. This is | |
23 | true, but what's interesting about distributed version control is | |
24 | that you don't _have to_ do so. In fact, the situation I'm talking | |
25 | about, you would probably have at least four repos: two of them | |
26 | on servers with a stable, addressable location, and two or more | |
27 | on local machines being worked on. Each server copy acts kind of | |
28 | like a 'central' copy for the programmer who owns it, and the | |
29 | pulls go back between them. | |
30 | ||
31 | Github and similar services were built to support this paradigm: I | |
32 | can create a repository, and anyone who wants can go in and, with | |
33 | a click, _fork_ the repository, which copies it into their own | |
34 | namespace. You can pull changes from one or the other, or notify | |
35 | someone that you have changes you'd like them to consider with | |
36 | a _pull request_[^pr]. Github tends to think of the original copy | |
37 | of a repo as "canonical", but that's merely a convention. | |
38 | ||
39 | [^pr]: I'd argue this is a poor piece of terminology: the phrase | |
40 | is ambiguous enough that I originally believed it had something | |
41 | to do with asking permission to clone someone's work. I've heard | |
42 | the alternative phrase _merge request_ used, which has the problem | |
43 | that it's less accurate to the underlying abstraction—it may or | |
44 | may not incur a merge in Git's sense—but it _does_ hint at the | |
45 | actual operation happening a bit more. I'm open to alternatives! | |
46 | ||
47 | On the other hand, Github itself is _not_ decentralized: it offers | |
48 | a centralized toolset for managing a decentralized technology. | |
49 | This gets criticized regularly, especially when Github has a major | |
50 | outage and programmers all around the world can't get work done. | |
51 | Github also offers more tooling on top of just repo hosting and | |
52 | management: project wikis, issue tracking, commenting in various | |
53 | places, and so forth. So let's think about what a distributed | |
54 | _Github_ would look like. | |
55 | ||
56 | # A Programmer's View | |
57 | ||
58 | Let's call our distributed Git software _GitNode_. | |
59 | ||
60 | Say I run an instance of GitNode on my personal web server at | |
61 | `http://gitnode.gdritter.com/`: this server contains copies of | |
62 | all the git repos I care about, and the GitNode server is aware | |
63 | of them, so if nothing else it gives me a nice browsable view of | |
64 | the state of the repo, past commits, and so forth. | |
65 | ||
66 | My friend _also_ runs an instance of GitNode at | |
67 | `http://gitnode.example.com/`, and they have a repo I want to | |
68 | work on called `my-project`. There are two ways I can do this: | |
69 | ||
70 | - I can go to my own GitNode instance and tell it the location | |
71 | of the repo I want to clone: that is, point it to | |
72 | `gitnode.example.com:my-project`, and it'll clone it for me. | |
73 | I could also write a bookmarklet or browser plugin to make this | |
74 | easier, to avoid having to retype or copy/paste things in. | |
75 | - I could also _perform a visit_ to my friend's GitNode instance. | |
76 | I navigate to `gitnode.example.com` and click a _Visit_ button, | |
77 | which |