Commit f398fd7783f54601fdc3a2b954bfc1582d77add8 - documents

Started distributed git draft Getty Ritter 9 years ago

1 changed file(s) with 77 addition(s) and 0 deletion(s). Collapse all Expand all

+77

-0

posts/distributed-git.md less more

	1	We used to use centralized version control systems, like SVN. Those
	2	had a single _canonical_ version of the repository, and multiple
	3	scattered copies that are "checked out" from the central repo. You
	4	synchronize your current state with that repo, make some changes,
	5	and "push" those changes back to the central repo.
	6
	7	Now, most of the computing industry has switched over to
	8	decentralized version control systems, like mercurial or git. In
	9	these, any repo can pull from any other: if you and a friend are
	10	collaborating on a project, and both of you have publicly visible
	11	repositories, you can ping your friend to tell them about new
	12	changes you've made and they can pull them from you, and then
	13	they can make further changes and you can pull from them. In
	14	this setup, there is no "central repository", at least in the
	15	technical sense. If this project grows, it's probably a good
	16	idea to choose one person's repo to be the "canonical" one, but
	17	as far as the _underlying technology_ is concerned, it's just another
	18	copy: it's only special in a _social_ sense.[^push]
	19
	20	[^push]: I'm aware that I'm ignoring _pushes_ here: you could also use
	21	git and friends basically the same way you would SVN, by having a
	22	central repository that everyone pull from and pushes to. This is
	23	true, but what's interesting about distributed version control is
	24	that you don't _have to_ do so. In fact, the situation I'm talking
	25	about, you would probably have at least four repos: two of them
	26	on servers with a stable, addressable location, and two or more
	27	on local machines being worked on. Each server copy acts kind of
	28	like a 'central' copy for the programmer who owns it, and the
	29	pulls go back between them.
	30
	31	Github and similar services were built to support this paradigm: I
	32	can create a repository, and anyone who wants can go in and, with
	33	a click, _fork_ the repository, which copies it into their own
	34	namespace. You can pull changes from one or the other, or notify
	35	someone that you have changes you'd like them to consider with
	36	a _pull request_[^pr]. Github tends to think of the original copy
	37	of a repo as "canonical", but that's merely a convention.
	38
	39	[^pr]: I'd argue this is a poor piece of terminology: the phrase
	40	is ambiguous enough that I originally believed it had something
	41	to do with asking permission to clone someone's work. I've heard
	42	the alternative phrase _merge request_ used, which has the problem
	43	that it's less accurate to the underlying abstraction—it may or
	44	may not incur a merge in Git's sense—but it _does_ hint at the
	45	actual operation happening a bit more. I'm open to alternatives!
	46
	47	On the other hand, Github itself is _not_ decentralized: it offers
	48	a centralized toolset for managing a decentralized technology.
	49	This gets criticized regularly, especially when Github has a major
	50	outage and programmers all around the world can't get work done.
	51	Github also offers more tooling on top of just repo hosting and
	52	management: project wikis, issue tracking, commenting in various
	53	places, and so forth. So let's think about what a distributed
	54	_Github_ would look like.
	55
	56	# A Programmer's View
	57
	58	Let's call our distributed Git software _GitNode_.
	59
	60	Say I run an instance of GitNode on my personal web server at
	61	`http://gitnode.gdritter.com/`: this server contains copies of
	62	all the git repos I care about, and the GitNode server is aware
	63	of them, so if nothing else it gives me a nice browsable view of
	64	the state of the repo, past commits, and so forth.
	65
	66	My friend _also_ runs an instance of GitNode at
	67	`http://gitnode.example.com/`, and they have a repo I want to
	68	work on called `my-project`. There are two ways I can do this:
	69
	70	- I can go to my own GitNode instance and tell it the location
	71	of the repo I want to clone: that is, point it to
	72	`gitnode.example.com:my-project`, and it'll clone it for me.
	73	I could also write a bookmarklet or browser plugin to make this
	74	easier, to avoid having to retype or copy/paste things in.
	75	- I could also _perform a visit_ to my friend's GitNode instance.
	76	I navigate to `gitnode.example.com` and click a _Visit_ button,
	77	which