Addendum to the git/darcs talk: pozorvlak

pozorvlak

Addendum to the git/darcs talk

Dec 01, 2010 15:03

After my git/darcs talk, some of the folks on darcs-users were kind enough to offer constructive criticism. In particular, Stephen Turnbull mentioned an interesting use-case which I want to discuss further.

As I tried to stress, the key insight required to translate between git-think and darcs-think is In git, the natural thing to pull is a branch (an ordered list of commits, each of which is assumed to depend on those before it); in darcs, the natural thing to pull is a patch (a named change whose dependencies are calculated optimistically by the system).
Stephen's use-case is this: you're a release manager, and one of your hackers has written some code you want to pull in. However, you don't want all their code. Suppose the change is nicely isolated into a single commit. In darcs, you can pull in just that commit (and a minimal set of prior commits required for it to apply cleanly). This is as far as my thinking had got, but Stephen points out that the interesting part of the story is what happens next: if you subsequently pull in other changes that depend on that commit, then darcs will note that it's already in your repository and all will be well. This is true in git if the developer has helpfully isolated that change into a branch: you can pull that branch, and subsequent merges will take account of the fact that you've done so. However, if the developer hasn't been so considerate, then you're potentially in trouble: you can cherry-pick that commit (creating a new commit with the same effect), but if you subsequently pull a branch containing it then git will not take account of your having cherry-picked it earlier. If either of you have changed any of the lines affected by that diff, then you'll get conflicts.

Thinking about this further, this means that I was even righter than I realised. In the git view of the world, the fact that that commit is not in its own branch is an assertion that it only makes sense in the context of the rest of the branch. Attempting to pull it in on its own is therefore not useful. You can do it, of course - it's Unix git, you can do anything - but you're making a rod for your own back. As I tried to emphasize in the talk, git-cherry-pick is a low-level, hackish tool, only really intended for use in drastic situations or in the privacy of your own local repo. If you want something semantically meaningful, only pull branches.

Git-using release managers, therefore, have to rely on developers to package atomic features sensibly into branches. If your developers can't be trusted to do this, you may have a problem. But note that darcs has the dual problem: if you can't trust your developers to specify semantic (non-textual) dependencies with darcs commit --ask-deps, then you're potentially going to be spending a lot of time tracking down semantic dependencies by hand. Having been a release manager under neither system, I don't have any intuition for which is worse - can anyone here shed any light?

[The cynic in me suggests that any benefits to the Darcs approach would only become apparent in projects which are large enough to rule out the use of Darcs with its current performance, but I could, as ever, be completely wrong. And besides, not being very useful right now doesn't rule out its ultimately proving to be a superior solution.]

On another note: Mark Stosberg (who's written quite a lot on the differences between darcs and git himself) confirms that people actually do use spontaneous branches, with ticket numbers as the "branch" identifiers. Which got me thinking. Any git user can see that spontaneous branches are more work for the user than real branches, because you have to remember your ticket number and type it into your commit message every time. Does that sound like a trivial amount of work to complain about? That's because you have no idea how easy branching and merging is in git. But it's also work that can be automated away with some tool support. Stick a file somewhere in _darcs containing the name of the current ticket, and somehow prepend that to your commit messages. I have just written a Perl script to do that (GitHub repo, share and enjoy).

Now we just need the ability to easily back-out and restore incomplete tickets without creating a whole new repo, and they'll be as convenient as git topic branches :-)

computers, programming, talks, beware the geek, git, darcs