Friday, January 4, 2008

Thinking in git

"git" is a distributed source code management tool that is gaining a lot of interest from many corners of the international software development community due to its incredible flexibility and power. I like to compare it to powerful text editors like Vim or Emacs: it has a steep learning curve, but once you pick it up it will increase your productivity immensely, and if you are a professional programmer who takes himself seriously, you owe it to yourself to learn it fully.

However, using this new power-tool, like all specialized tools, requires readjusting you way of thinking about how you work before you can take advantage of the benefits.


1. Your local clone is also a repository in its own right



A repository is a .git directory. Every functional instance of a .git directory represents a full-featured repository, regardless how you came to possess it.

Its a deceptively simple but profound change in perspective. Ponder the implications fully.


2. All repositories are created equal



There is no inherent security policy in git (you have to piggy-back on SSH or Apache for that), and there is no sense of hierarchy among repositories that you do not define yourself through external policy.

Once you clone a branch from a remote repository, the originating branch is technically no more authoritative than yours is. The only technical constraint on branching and merging across repositories is that all branches must have a coherent history.

If two branches across repositories grow apart over time, and it is desired to reunite them into a common public branch, anyone attempting merge the two must reconcile the histories is such a way that other repositories are able to recognize the history as either a common ancestor of a local branch, or reachable future given their current state.


3. Branch everything



Branching and merging is the bread and butter of a SCM, so it hopefully goes without saying that branches are trivial, and merges are as smart as possible for a machine, so use them.

Branch to test out a idea. Branch to test out a merge. Branch to impress your friends at cocktail parties.


4. Git is a low-level tool



Git, from the beginning, was designed to be a low-level storage format for a Distributed SCM, and its only recently that git has included within itself some high-level commands for the kind of basic operations that programmers care most about. If you take the time to understand the basic representation that git uses, you will find it harder to get confused about why the high-level commands do things you didn't expect them to do.

Git assumes very little about a programmer's workflow, and converts your commands to into rather simplistic and literal manipulations upon the basic data structures within the .git directory, so it pays to take a look inside the internal representation once and a while to verify your assumptions.


5. You can break your own repository, and others'



Since git makes almost no assumptions about your work flow, contains no inherent security policy, and has no concept of repository hierarchy, it is very easy to do something that can seriously ruin anyone's repository.

Git tries hard to be non-destructive when making changes, but it will always do exactly what you ask of it. And while an experienced git user can likely recover from a whole host of mistakes, there is no guarantee that you can.

While learning git, always back up your working source and .git directory regularly. Always make new branches to test out your actions before you proceed on important branches. And finally, take care when pushing/pulling changes to/from someone else's repository.


6. Every commit is a change, uniquely identified by a SHA-1 hash number



The basic currency of git is a set of changes to the .git data structure represented by a commit, and every commit is uniquely identified by a SHA-1 number.


7. A repository is a directed acyclic graph of commits, with named branches



As commits succeed commits, a linear history is built within the .git directory. When a branch is made it is given a name, and when a commit follows, the history becomes a tree and thus non-linear. When a merge happens, the two histories are joined together under one of the named branches.

Since history only proceeds in one direction, the graph is acyclic.


8. Commits and branches exist in your .git directory, not your working directory



Git stores the above DAG compressed in data structures in your .git directory. The files that appear in your working copy is only a decompressed representation of those structures, after they have been processed by your git commands.

If files appear to be out of place or missing, consult the git command line tools to inspect the .git repository first.

No comments: