Tuesday, November 24, 2009

Under the Hood of git clone

When you clone a git repository, everything is automatically setup to allow you to fetch, pull, push to and from the remote repository, origin. But what is really going on? git remote is configured with a few lines of configuration in the config file inside the .git/ directory.

Here’s how it works:

Create a new repository, called base, add a file to it, then commit.
$ mkdir base;cd base;git init
Initialized empty Git repository in /Users/andersjanmyr/tmp/repos/base/.git/
$ echo foo > bar.txt
$ git add .
$ git commit -m initial
[master (root-commit) 548d762] initial
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 bar.txt
Clone this repository, called klon.
$ cd ..
$ git clone base klon
Initialized empty Git repository in /Users/andersjanmyr/tmp/repos/klon/.git/
Initialize a new repository, called kopy.
$ mkdir kopy;cd kopy;git init
Initialized empty Git repository in /Users/andersjanmyr/tmp/repos/kopy/.git/
The difference in configuration between the klon and the kopy.
$ diff klon/.git/config kopy/.git/config
7,12d6
< [remote "origin"]
<       fetch = +refs/heads/*:refs/remotes/origin/*
<       url = /Users/andersjanmyr/tmp/repos/base
< [branch "master"]
<       remote = origin
<       merge = refs/heads/master

To set up the newly created repository to work the same way the clone does, all I have to do is to edit this file to make it look the same. This is not what git does, so lets do it the git way.

Fixing the remote configuration.
$ cd kopy
$ git remote add origin /Users/andersjanmyr/tmp/repos/base
This adds the [remote "origin"] entry to the config file.
[remote "origin"]
        url = /Users/andersjanmyr/tmp/repos/base
        fetch = +refs/heads/*:refs/remotes/origin/*
Fetching from the origin adds the remote heads to .git/.
$ git fetch
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /Users/andersjanmyr/tmp/repos/base
 * [new branch]      master     -> origin/master
$ find .git/refs
.git/refs/
.git/refs/heads
.git/refs/remotes
.git/refs/remotes/origin
.git/refs/remotes/origin/master
.git/refs/tags

Now I can check out the origin/master, but if I do it the normal way, the configuration will not be set up correctly to allow me to pull and push the way i can with the clone.

Checkout the master version with tracking information.
# DON'T DO THIS, It does not add the tracking information to the config file.
$ git checkout -b master origin/master

# This add the tracking information to the config file.
$ git checkout --track -b master origin/master
Branch master set up to track remote branch master from origin.
Already on 'master'
The following information is added to .git/config when --track is used.
[branch "master"]
        remote = origin
        merge = refs/heads/master

That’s it! Now the .git/config file looks the same as if I had done a normal clone, but lets continue. What do the entries in the config file mean.

Definition of the remote.
[remote "origin"]
        url = /Users/andersjanmyr/tmp/repos/base
        fetch = +refs/heads/*:refs/remotes/origin/*
Definition of the remote as git config commands.
$ git config remote.origin.url /Users/andersjanmyr/tmp/repos/base
$ git config remote.fetch = +refs/heads/*:refs/remotes/origin/*

The first part is just declaring the alias origin for the remote url (or local in this case :).

The second part of the definition is more interesting. It sets up the refspec that will be used if you don’t provide anything on the command line. As we usually don’t provide a full refspec, most people don’t know what it is, this is extremely useful. In case you don’t know, the remote commands of git, push, pull, and fetch take a refspec as their last parameter. It is just that we usually just refer to a small part of it.

Usage of the remote git commands.
git pull <options> <repository> <refspec>...
git fetch <options> <repository> <refspec>...
git push <options> <repository> <refspec>...

The format of a refspec parameter is an optional plus +, followed by the source ref src, followed by a colon :, followed by the destination ref dest.

It defines what dest object should be updated by the src object.

Example definition of refspec.
# The local
# +<src>:<dest>
+refs/heads/spike:refs/remotes/origin/master

In our day-to-day usage of git, we usually don’t use the full syntax of the refspec. Instead we just refer to simple names. Like this.

Day-to-day usage of refspecs.
# Push the local branch to the remote branch with the same name
$ git push origin
# Pull the master into the local master.
$ git pull origin master
# Fetch the master of the origin and put the result in the remote experimental
$ git fetch origin master:refs/remotes/origin/experimental

The above really means:

Definition of the branch, expanded
# Push the local branch to the remote branch with the same name
$ git push refs/heads/*:refs/remotes/origin/*
# Pull the master into the local master.
$ git pull origin refs/heads/master:refs/remotes/origin/master
# Fetch the master of the origin and put the result in the remote experimental
$ git fetch origin refs/heads/master:refs/remotes/origin/experimental

From the above syntax, it is also possible to decrypt the obscure syntax used when deleting a remote branch. Deleting is the same as pushing to a remote branch without giving a local branch.

Delete a remote branch.
# Delete the remote branch serverfix
git push origin :serverfix

Now, we are down to the last part of the configuration, the branch definition.

Definition of the branch
[branch "master"]
        remote = origin
        merge = refs/heads/master

The first part branch.master.remote, tells git to use origin as the default remote, if none is given for this local branch.

The second part tells git which remote branch to use when merging. This also affects pull and fetch. Depending on your settings of push.default, it will also affect push.

Hopefully this has clarified some of the intricacies of git remoting. Just remember that if you make a mistake, you can always fire up an editor and edit the config file directly.

I’ll finish up with some more commands that can be used to get information about the remote.

Additional remote commands, to explore a remote.
# Show all remote branches
$ git branch -r
  origin/cucumber
  origin/customercare-0.6.x
  origin/master

# Show all remotes verbosely
$ git remote -v
origin  /Users/andersjanmyr/tmp/repos/base (fetch)
origin  /Users/andersjanmyr/tmp/repos/base (push)

# Show info about the remote
$ git remote show origin
* remote origin
  Fetch URL: /Users/andersjanmyr/tmp/repos/base
  Push  URL: /Users/andersjanmyr/tmp/repos/base
  HEAD branch: master
  Remote branches:
    experimental stale (use 'git remote prune' to remove)
    master       tracked
  Local branch configured for 'git pull':
    master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (up to date)

# List the remote heads of the origin
$ git ls-remote --heads origin
548d7624f5385d36314e8ab61e61e8872c0bfe90        refs/heads/master

That’s it for today.

No comments: